Large volumes of data can be processed at one go, such as 11 hours of audio, 1 hour of video, and codebases with more than 30,000 lines of code or 700,000 words.
Google launched Gemini 1.5 today. With a 1 million token context window, the largest ever observed in natural processing models, this new model exceeds ChatGPT and Claud. Claude 2.1 has a 200K context window, while GPT-4 Turbo has a 128K context window.
The blog post, co-authored by Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis, states, “We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.” It also compares the model to other models already in use, such as ChatGPT and Claude.
A 128,000 token context window is standard with Gemini 1.5 Pro. However, as of right now, it is available for private preview via AI Studio and Vertex AI for a restricted set of developers and enterprise clients to test out with a context window of up to 1 million tokens.
Large volumes of data can be processed in one go, such as 11 hours of audio, 1 hour of video, and codebases with more than 30,000 lines of code or 700,000 words. Google also successfully tested up to 10 million tokens in their research.
Transformer and MoE architecture serve as the foundation for Gemini 1.5. MoE models are separated into smaller “expert” neural networks, whereas a typical Transformer operates as a single, massive neural network.
The capabilities of Gemini 1.5 Pro are multifaceted; it can analyze long transcripts of historical events, such as those from Apollo 11’s mission, or it can comprehend and make sense of a silent film. The model’s capacity to handle large amounts of code demonstrates its versatility and effectiveness and reinforces its applicability in challenging problem-solving scenarios.
In the Needle In A Haystack (NIAH) examination, Gemini 1.5 Pro performs extremely well, obtaining an impressive 99% success rate in finding specific data inside lengthy text passages. The Machine Translation from One Book (MTOB) benchmark serves as a testament to Gemini 1.5 Pro’s in-context learning capabilities, which further establish its leadership in adaptive learning.
This latest advancement transpires after Google’s recent publication of the initial iteration of Gemini Ultra. Google has just included generative AI capabilities in Chrome. The “Help Me Write” tool has been added by Google to all websites. Users can activate this function by right-clicking on any text box. This will cause Google’s AI to ask what kind of writing they need to be done and then produce a first draft.
According to reports, OpenAI is developing a web search tool to compete with Google, while Google is concentrating on enhancing its AI models. Furthermore, Altman states that OpenAI is developing GPT-5, their upcoming LLM, which is anticipated to be more intelligent than before.
Sora, an OpenAI text-to-video creation model, was also just released. It can produce videos up to one minute in length while adhering to the user’s instructions and preserving visual quality. Llama 3 is anticipated to be released by Meta shortly.