🧀 BigCheese.ai


Reaching LLaMA2 Performance with 0.1M Dollars


JetMoE-8B, achieving better performance than LLaMA2-7B and comparable models, has been trained on a significantly lower budget of less than $0.1 million. It's designed to be open, using public datasets, and is optimized for lower computational costs with only 2.2B active parameters during inference. The model and training code are open-sourced and approachable for academia with limited compute budgets.

  • JetMoE-8B outperforms LLaMA2-7B with less funding.
  • It uses only public datasets for training.
  • 2.2B active params lower inference costs.
  • JetMoE-8B's training cost around $0.08 million.
  • Open-source project supports academia.