Google Launches New Multi-modal Gemini AI Model

On December 6, Alphabet, the parent company of Google, unveiled the first phase of its next-generation AI model, Gemini. This groundbreaking model, developed under the leadership of CEO Sundar Pichai and Google DeepMind, represents a notable leap forward in AI technology.

Gemini stands out as the first AI model to surpass human experts in Massive Multitask Language Understanding (MMLU), a widely recognized benchmark for evaluating language model performance.

This achievement is a testament to Gemini’s advanced capabilities in generating code, text, and images, as well as its ability to perform visual reasoning across different languages.

Sundar Pichai emphasized Gemini’s superior performance compared to OpenAI’s ChatGPT, particularly in multimodal benchmarks. He noted that Gemini has achieved a milestone by crossing the 90% threshold in MMLU.

Which is a notable improvement from the 30-40% state of the art just two years ago.

Gemini’s design focuses on efficiency and scalability, making it a versatile tool for integrating with existing technologies and APIs.

This open-source approach encourages collaboration within the AI community, accelerating progress and maximizing Gemini’s potential.

The model comes in three versions: Ultra, Pro, and Nano. Gemini Ultra is the largest, while Gemini Pro, which powers Google’s Bard chatbot, is medium-sized.

The Nano version is smaller and more efficient, designed to run on Google’s Pixel 8 Pro phone.

Reactions to Gemini have been mixed. Some users report impressive results, while others note ongoing issues with hallucinations. Melanie Mitchell, an AI researcher, expressed that while Gemini is sophisticated, it may not be substantially more capable than GPT-4.

Gemini’s development included comprehensive impact assessments to identify potential societal benefits and harms.

These assessments guided the creation of model policies for development and evaluation. Mitigations were implemented at the data layer, and instruction tuning was used to address safety concerns, including reducing hallucinations.

Developers interested in Gemini can access a technical report provided by Google for more information.