close_game
close_game

OpenAI’s HealthBench reveals how well AI answers medical questions

May 13, 2025 12:29 PM IST

OpenAI has launched HealthBench, a new dataset designed to test how accurately AI models respond to real-world health care questions.

OpenAI has introduced HealthBench, a comprehensive dataset designed to assess how well AI models respond to health care-related questions. This release aims to enhance the evaluation of AI's performance in providing accurate, reliable responses to health inquiries. The open-source dataset is supported by detailed evaluation rubrics, and experts recognise its scale and depth as a significant advancement in AI health care applications.

OpenAI has launched HealthBench to test how accurately AI models respond to health care-related questions.(Pexels)
OpenAI has launched HealthBench to test how accurately AI models respond to health care-related questions.(Pexels)

HealthBench was developed in collaboration with 262 physicians from 60 countries and includes 5,000 simulated health conversations. The dataset focuses on determining whether AI systems can deliver optimal responses to health-related queries. Each response is analysed based on a rubric written by physicians, with criteria weighted according to medical judgment. GPT-4.1 is used to score these responses.

Also read: iPadOS 19 update: Apple to unveil redesigned Siri, menu bar and more at WWDC 2025

Performance Scores of AI Models

According to HealthBench, OpenAI's o3 reasoning model performs the best with a score of 60 percent, followed by Elon Musk's Grok at 54 percent, and Google's Gemini 2.5 Pro at 52 percent. The dataset is capable of handling 49 languages, including Amharic and Nepali, and covers 26 medical specialities, such as neurology and ophthalmology.

Also read: Alcatel V3 Ultra mobile phone with stylus support to launch in India soon: Here’s what to expect

In one example shared by OpenAI, the dataset poses a scenario where a 70-year-old neighbour is found unresponsive on the floor. The AI model is asked what steps should be taken. The model provides instructions like calling emergency services, checking breathing, and ensuring the airways are clear. HealthBench evaluates the response, which highlights correct actions and areas for improvement, giving a final score of 77 percent in this instance.

This launch marks OpenAI's first significant venture into AI applications in health care, beyond external partnerships. HealthBench is poised to be a valuable tool for understanding how well AI models can support medical decision-making.

Also read: iOS 19 to take to boost iPhone’s battery life with help of AI

ChatGPT’s Expanded Shopping Capabilities

In addition to the health care dataset, OpenAI recently enhanced its ChatGPT with an updated web search feature, which will offer personalised product recommendations. The search tool, popular among users, provides tailored suggestions across various categories and is available to all users worldwide, regardless of subscription tier. This update further strengthens OpenAI's position in the competitive search landscape and will challenge established players like Google.

Mobile finder: Apple iPhone 16 Pro Max LATEST price, specs and all details

Unmissable Offers in Amazon Sale (May 2025) Grab amazing deals on summer appliances, laptops, large & kitchen appliances, gadgets and more in Amazon Great Summer Sale (2025).
Unmissable Offers in Amazon Sale (May 2025) Grab amazing deals on summer appliances, laptops, large & kitchen appliances, gadgets and more in Amazon Great Summer Sale (2025).
SHARE THIS ARTICLE ON
SHARE
close
Story Saved
Live Score
Saved Articles
Following
My Reads
Sign out
Get App