Study uncovers hidden flaws in expert-level diagnostic accuracy of GPT-4 vision AI in medicine, revealing the necessity for more in-depth evaluation before clinical integration.
GPT-4V outperforms human physicians in accuracy on medical challenge tasks.
GPT-4V exhibits flaws in rationalizing correct answers in 35.5% of cases.
Image comprehension error rate for GPT-4V exceeds 25%.
Human physicians achieve higher accuracy with open-book resources.
Research calls for thorough rationale assessments of AI diagnostics.