Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

🧀

Study uncovers hidden flaws in expert-level diagnostic accuracy of GPT-4 vision AI in medicine, revealing the necessity for more in-depth evaluation before clinical integration.

GPT-4V outperforms human physicians in accuracy on medical challenge tasks.
GPT-4V exhibits flaws in rationalizing correct answers in 35.5% of cases.
Image comprehension error rate for GPT-4V exceeds 25%.
Human physicians achieve higher accuracy with open-book resources.
Research calls for thorough rationale assessments of AI diagnostics.

View Website Article DOI PDF Download

Social

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine