Dragonfly, a new vision-language model, uses a breakthrough multi-resolution zoom-and-select architecture to enhance visual understanding. It achieves improved performance on commonsense visual QA and image captioning benchmarks.