Researcher Sander Dieleman explores the close relationship between diffusion models and autoregressive models in image generation, asserting that diffusion models use approximate autoregression in the frequency domain. The article also touches on the implications for sound and language processing, and speculates on the future of generative models for multimodal inputs.