🧀 BigCheese.ai

Social

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

🧀

Megalodon presents an efficient neural architecture for pretraining and inference with unlimited context length, outperforming solutions like linear attention and state space models. The approach introduces components such as complex exponential moving average and normalized attention mechanisms to improve capability and stability. Megalodon achieves significant efficiency at a 7 billion parameter scale.

  • Megalodon improves long sequence scaling
  • Authors include Xuezhe Ma
  • Introduced timestep normalization
  • Competes with Llama2-7B and 13B
  • Code available on GitHub