Based on the Helium 7B model, Moshi combines text and audio training; it is supported for both 4- and 8-bit quantization and is optimized for CUDA, Metal, and CPU backends.
Moshi is a real-time native multimodal fundamental AI model that has been introduced by Kyutai, a non-profit AI research center located in France. This open-source project has a voice-activated AI assistant with features comparable to Google Astra and OpenAI’s GPT-4o.
In just six months, a team of just eight researchers created Moshi, a machine that can speak with multiple dialects, comprehend and express seventy different moods and styles, and manage two audio streams at once, enabling it to both listen and speak.
Moshi, which integrates text and audio training, is based on the Helium 7B model and is optimized for CUDA, Metal, and CPU backends. It supports both 4-bit and 8-bit quantization.
Among Moshi’s salient qualities are:
– Real-time communication with a 200 ms end-to-end latency.
– Compatibility with consumer-grade hardware, such as MacBooks.
– Multiple backends (CUDA, Metal, and CPU) are supported.
– Watermarking to identify audio produced by AI (under development).
“Moshi thinks while it talks,” according to Kyutai chief Patrick Pérez, who believes that the device has the potential to completely transform human-machine communication.
The complete model—which includes the audio codec, the 7B model, the inference codebase, and the optimized stack—will be made available by Kyutai.
The French billionaire Xavier Niel was among the investors who provided €300 million for the startup, which was established in November 2023 to promote ecosystem development and open research in AI.
The lab’s strategy aims at well-known AI firms like OpenAI, who have come under fire for postponing releases because of security issues. Notably, OpenAI has been delaying the release of GPT-4o’s speech mode and speech Engine, as well as its video-generating model Sora.
Along with other initiatives of French provenance like Hugging Face and Mistral, Moshi adds to France’s growing prominence in the AI space.