This paper presents a study on the compute-optimal strategy for training language model (LM) reasoners. Contrary to the common practice of using strong but expensive models for synthetic data generation, the authors show that weaker, cheaper models can yield better training data. They evaluate the data from both strong and weak models across several metrics and find that models trained on weak model-generated data outperformed their strong model-trained counterparts across various benchmarks.