🧀 BigCheese.ai

Social

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

🧀

Study uncovers hidden flaws in expert-level diagnostic accuracy of GPT-4 vision AI in medicine, revealing the necessity for more in-depth evaluation before clinical integration.

  • GPT-4V outperforms human physicians in accuracy on medical challenge tasks.
  • GPT-4V exhibits flaws in rationalizing correct answers in 35.5% of cases.
  • Image comprehension error rate for GPT-4V exceeds 25%.
  • Human physicians achieve higher accuracy with open-book resources.
  • Research calls for thorough rationale assessments of AI diagnostics.