LLaMA-Omni is a cutting-edge speech interaction model that provides low-latency, high-quality end-to-end speech capabilities. Building on Llama-3.1-8B-Instruct, it aims for performance at the GPT-4 level, featuring simultaneous generation of text and speech responses and can be trained in less than 3 days with just 4 GPUs.