Moshi is a speech-text foundation model for real-time dialogue by Kyutai Labs. This innovative model, which lies on the GitHub repository, includes a streaming neural audio codec named Mimi, achieving low latency and high-quality audio processing. The repo provides different versions for PyTorch, MLX for M series Macs, and Rust used in production. It emphasizes development focused on real-time applications and dialogue systems.