Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling

🧀

This paper presents a study on the compute-optimal strategy for training language model (LM) reasoners. Contrary to the common practice of using strong but expensive models for synthetic data generation, the authors show that weaker, cheaper models can yield better training data. They evaluate the data from both strong and weak models across several metrics and find that models trained on weak model-generated data outperformed their strong model-trained counterparts across various benchmarks.

Paper title: Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Submitted on 29 Aug 2024
Authors: Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2408.16737 [cs.CL]

View Website PDF HTML Citation

Social

Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling