“Meet Moshi: ChatGPT’s New Competitor with Advanced Voice Tone Recognition”

Antra Mishra

5 months ago

A few days ago, OpenAI made news by postponing the release of ChatGPT’s anticipated voice mode. The company cited the need to resolve technical issues and maintain a high level of quality. This was disappointing for those eagerly anticipating the opportunity to converse with the AI chatbot. However, there is now another chatbot on the scene that not only communicates with you, but also comprehends your tone of voice. Meet Moshi, created by the French AI company Kyutai.

Say hello to Moshi:

Moshi, an AI voice assistant, is designed to provide realistic conversations similar to Amazon’s Alexa or Google Assistant by utilizing the powerful Helium 7B language model. This new chatbot distinguishes itself with its ability to speak in various accents and utilize 70 different emotional and speaking styles. It can also detect the tone of your voice as you interact with it. Additionally, Moshi can manage two audio streams simultaneously, enabling it to listen and respond concurrently. The voice assistant’s recent live stream launch has garnered significant attention and continues to make headlines. As reported by Tech Radar, the development of Moshi involved an extensive fine-tuning process, incorporating over 100,000 synthetic dialogues generated through Text-to-Speech (TTS) technology. To enhance the chatbot’s voice quality, Kyutai collaborated with a professional voice artist to ensure that Moshi’s responses are natural and engaging.

The company stated in a press release to Toms Guide that this innovative technology enables users to interact with an AI in a seamless, organic, and emotive manner for the very first time.

Available for use starting now

You have the opportunity to test out Moshi by trying the demo version available now. Simply visit us.moshi.chat and follow the provided instructions. During this time, you can engage with the AI voice assistant for a maximum of 5 minutes. Prior to interacting with Moshi, you will encounter a message stating, “Moshi is an experimental conversational AI. Take everything it says with a grain of salt. Conversations are limited to 5 min. Moshi thinks and speaks at the same time. Moshi can listen and talk at all time: maximum flow between you and Moshi. Ask it to do some Pirate role play, how to make Lasagna, or what movie it watched last. We strive to support all browsers, Chrome works best. Baked with <3 @Kyutai. You are on the US demo. Depending on your location, maybe the EU demo will offer better latency.”

Kyutai is dedicated to transforming Moshi into an open-source project. By releasing the model’s code and framework, the company aims to foster innovation and address ethical concerns related to AI development. This open-source approach has garnered support from influential figures, including French billionaire Xavier Niel.

In the future, Kyutai intends to incorporate advanced features into Moshi, such as AI audio identification, watermarking, and signature tracking systems. These enhancements will help ensure accountability and traceability for AI-generated audio, promoting transparency in AI technology.

If Moshi gains momentum, it could act as a catalyst for other voice-enabled AI assistants and expedite the integration of large language models into existing systems like Alexa. The impressive capabilities demonstrated by Moshi point towards a promising future for voice AI technology.