Octopus v2 is presented as an on-device language model outperforming GPT-4 in accuracy and latency while reducing context length by 95%. The model's 2 billion parameters enhance latency by 35-fold compared to Llama-7B with a RAG-based function calling mechanism, making it suitable for edge devices in real-world applications.