🧀 BigCheese.ai

Social

Bugs in LLM Training – Gradient Accumulation Fix

🧀

Unsloth AI has addressed a critical issue in Gradient Accumulation that has been affecting training and finetuning of language models (LLMs). The bug led to higher training losses for larger gradient accumulation steps. The Unsloth AI team formulated and implemented a fix, resulting in improved accuracy for gradient accumulation that aligns with full batch training. Updating Unsloth and using their fixed trainer, as advised, can significantly lower associated errors.

  • Gradient accumulation mimics full batch training with reduced VRAM usage.
  • The bug was first discovered in 2021 by Zhaofeng.
  • Their gradient accumulation fix handles loss calculations better.
  • Training with Unsloth can now produce results equivalent to full batch.
  • The Unsloth team has provided updated tools and resources for users.