OpenAI’s HealthBench reveals how well AI answers medical questions

OpenAI has launched HealthBench, a new dataset designed to test how accurately AI models respond to real-world health care questions.

Published on: May 13, 2025 12:29 PM IST

Prefer HTon Google

Share via

Copy link

You may be interested in

6% OFF

₹1,35,900

₹1,44,900

Check Details

₹1,09,999

Check Details

₹1,29,999

Check Details

₹1,19,900

Check Details

₹1,39,999

Check Details

OpenAI has introduced HealthBench, a comprehensive dataset designed to assess how well AI models respond to health care-related questions. This release aims to enhance the evaluation of AI's performance in providing accurate, reliable responses to health inquiries. The open-source dataset is supported by detailed evaluation rubrics, and experts recognise its scale and depth as a significant advancement in AI health care applications.

OpenAI has launched HealthBench to test how accurately AI models respond to health care-related questions. (Pexels)

By MD Ijaj Khan

Ijaj Khan is a technology journalist and Senior Content Producer at Hindustan Times, with over three years of experience covering the consumer technology industry. His work spans smartphones, laptops, wearables, gaming, appliances and AI - from hands-on reviews, comparison and buying guides to breaking news and in-depth features that help readers cut through the noise and make informed decisions. Before joining HT Tech, he worked with Jagran New Media, where he sharpened his instincts for fast-paced digital reporting. He holds a Post Graduate Diploma in English Journalism and Mass Communication from the Indian Institute of Mass Communication (IIMC), Delhi. Whether he's testing the latest flagship smartphone, tracking a major AI announcement, or putting a gaming laptop through its paces, Ijaj approaches every story with the same goal - making technology feel relevant and easy to understand for everyday users, not just enthusiasts. When he's not in front of a screen for work, he's usually travelling to a new city, hunting for great food, or keeping tabs on what's next in tech before everyone else catches on.

Performance Scores of AI Models

According to HealthBench, OpenAI's o3 reasoning model performs the best with a score of 60 percent, followed by Elon Musk's Grok at 54 percent, and Google's Gemini 2.5 Pro at 52 percent. The dataset is capable of handling 49 languages, including Amharic and Nepali, and covers 26 medical specialities, such as neurology and ophthalmology.

Also read: Alcatel V3 Ultra mobile phone with stylus support to launch in India soon: Here’s what to expect

In one example shared by OpenAI, the dataset poses a scenario where a 70-year-old neighbour is found unresponsive on the floor. The AI model is asked what steps should be taken. The model provides instructions like calling emergency services, checking breathing, and ensuring the airways are clear. HealthBench evaluates the response, which highlights correct actions and areas for improvement, giving a final score of 77 percent in this instance.

This launch marks OpenAI's first significant venture into AI applications in health care, beyond external partnerships. HealthBench is poised to be a valuable tool for understanding how well AI models can support medical decision-making.

Also read: iOS 19 to take to boost iPhone’s battery life with help of AI

ChatGPT’s Expanded Shopping Capabilities

In addition to the health care dataset, OpenAI recently enhanced its ChatGPT with an updated web search feature, which will offer personalised product recommendations. The search tool, popular among users, provides tailored suggestions across various categories and is available to all users worldwide, regardless of subscription tier. This update further strengthens OpenAI's position in the competitive search landscape and will challenge established players like Google.

Mobile finder: Apple iPhone 16 Pro Max LATEST price, specs and all details

ABOUT THE AUTHOR
MD Ijaj Khan
Ijaj Khan is a technology journalist and Senior Content Producer at Hindustan Times, with over three years of experience covering the consumer technology industry. His work spans smartphones, laptops, wearables, gaming, appliances and AI - from hands-on reviews, comparison and buying guides to breaking news and in-depth features that help readers cut through the noise and make informed decisions. Before joining HT Tech, he worked with Jagran New Media, where he sharpened his instincts for fast-paced digital reporting. He holds a Post Graduate Diploma in English Journalism and Mass Communication from the Indian Institute of Mass Communication (IIMC), Delhi. Whether he's testing the latest flagship smartphone, tracking a major AI announcement, or putting a gaming laptop through its paces, Ijaj approaches every story with the same goal - making technology feel relevant and easy to understand for everyday users, not just enthusiasts. When he's not in front of a screen for work, he's usually travelling to a new city, hunting for great food, or keeping tabs on what's next in tech before everyone else catches on.Read More

Home/Technology/OpenAI’s HealthBench Reveals How Well AI Answers Medical Questions

OpenAI’s HealthBench reveals how well AI answers medical questions

OpenAI has launched HealthBench, a new dataset designed to test how accurately AI models respond to real-world health care questions.

You may be interested in

Apple IPhone 16 Pro Max

Xiaomi 15 Ultra

Samsung Galaxy S25 Ultra

Apple IPhone 16 Pro

OnePlus Open

Performance Scores of AI Models

ChatGPT’s Expanded Shopping Capabilities

Mobile finder: Apple iPhone 16 Pro Max LATEST price, specs and all details