A voice model significantly more human-like than OpenAIs advanced voice

🧀

Sesame's research focuses on overcoming the shortcomings of digital voice assistants by creating conversational partners with 'voice presence.' They aim to incorporate emotional intelligence, conversational dynamics, contextual awareness, and consistent personality for natural dialogue interactions. The newly introduced Conversational Speech Model (CSM) promises enhanced naturalness and is part of their continuous advancements in voice synthesis technology.

Voice presence is Sesame's objective for natural interaction.
CSM uses transformers for speech generation.
CSM operates on RVQ tokens for efficiency.
Models will be open-sourced under Apache 2.0.
Upcoming work includes scaling and multilingual support.

View Website Demo Technical Post GitHub

Social

A voice model significantly more human-like than OpenAIs advanced voice