Transfusion introduces a novel method to train multi-modal models on both discrete and continuous data, combining language modeling with diffusion techniques. The pretraining of various sized models up to 7B parameters showcases significant scaling benefits over traditional methods, especially in uni- and cross-modal contexts.