Close Menu
    Trending
    • Singapore Airlines, Southwest Airlines partner to expand access to nearly 120 US destinations
    • Trump warns Netanyahu: ‘You’ll be on your own’ if attacks on Iran continue | US-Israel war on Iran News
    • Cristiano Ronaldo, ‘The Bosnian Diamond’ headline the World Cup 40-and-over club
    • How housing market inventory is shifting across every state
    • What is a ‘normal’ memory slowdown, and when should I worry?
    • Ariana Grande And Ethan Slater Are ‘Still Friends’ Following Split
    • US says BYD, Baidu, Alibaba and other tech giants are aiding China’s military
    • Maine’s Platner faces test as four US states hold midterm primary votes | US Midterm Elections 2026 News
    Benjamin Franklin Institute
    Tuesday, June 9
    • Home
    • Politics
    • Business
    • Science
    • Technology
    • Arts & Entertainment
    • International
    Benjamin Franklin Institute
    Home»Science»AI chatbots miss urgent issues in queries about women’s health
    Science

    AI chatbots miss urgent issues in queries about women’s health

    Team_Benjamin Franklin InstituteBy Team_Benjamin Franklin InstituteJanuary 7, 2026No Comments3 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    Share
    Facebook Twitter Pinterest Email Copy Link


    Many women are using AI for health information, but the answers aren’t always up to scratch

    Oscar Wong/Getty Images

    Commonly used AI models fail to accurately diagnose or offer advice for many queries relating to women’s health that require urgent attention.

    Thirteen large language models, produced by the likes of OpenAI, Google, Anthropic, Mistral AI and xAI, were given 345 medical queries across five specialities, including emergency medicine, gynaecology and neurology. The queries were written by 17 women’s health researchers, pharmacists and clinicians from the US and Europe.

    The answers were reviewed by the same experts. Any questions that the models failed at were collated into a benchmarking test of AI models’ medical expertise that included 96 queries.

    Across all the models, some 60 per cent of questions were answered in a way that the human experts had previously said wasn’t sufficient for medical advice. GPT-5 was the best-performing model, failing on 47 per cent of queries, while Ministral 8B had the highest failure rate of 73 per cent.

    “I saw more and more women in my own circle turning to AI tools for health questions and decision support,” says team member Victoria-Elisabeth Gruber at Lumos AI, a firm that helps companies evaluate and improve their own AI models. She and her colleagues recognised the risks of relying on a technology that inherits and amplifies existing gender gaps in medical knowledge. “That is what motivated us to build a first benchmark in this field,” she says.

    The rate of failure surprised Gruber. “We expected some gaps, but what stood out was the degree of variation across models,” she says.

    The findings are unsurprising because of the way AI models are trained, based in human-generated historical data that has built-in biases, says Cara Tannenbaum at the University of Montreal, Canada. They point to “a clear need for online health sources, as well as healthcare professional societies, to update their web content with more explicit sex and gender-related evidence-based information that AI can use to more accurately support women’s health”, she says.

    Jonathan H. Chen at Stanford University in California says 60 per cent failure rate touted by the researchers behind the analysis is somewhat misleading. “I wouldn’t hang on the 60 per cent number, since it was a limited and expert-designed sample,” he says. “[It] wasn’t designed to be a broad sample or representative of what patients or doctors regularly would ask.”

    Chen also points out that some of the scenarios that the model tests for are overly conservative, with high potential failure rates. For example, if postpartum women complain of a headache, the model suggests AI models fail if pre-eclampsia isn’t immediately suspected.

    Gruber acknowledges and recognises those criticisms. “Our goal was not to claim that models are broadly unsafe, but to define a clear, clinically grounded standard for evaluation,” she says. “The benchmark is intentionally conservative and on the stricter side in how it defines failures, because in healthcare, even seemingly minor omissions can matter depending on context.”

    A spokesperson for OpenAI said: “ChatGPT is designed to support, not replace, medical care. We work closely with clinicians around the world to improve our models and run ongoing evaluations to reduce harmful or misleading responses. Our latest GPT 5.2 model is our strongest yet at considering important user context such as gender. We take the accuracy of model outputs seriously and while ChatGPT can provide helpful information, users should always rely on qualified clinicians for care and treatment decisions.” The other companies whose AIs were tested did not respond to New Scientist’s request for comment.

    Topics:



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link

    Related Posts

    Science

    What is a ‘normal’ memory slowdown, and when should I worry?

    June 9, 2026
    Science

    Wildlife thrives in solar farm built on restored peatland

    June 8, 2026
    Science

    You don’t need to worry about recursive-self-improving AI – yet

    June 8, 2026
    Science

    Understanding anorexia’s grip on the brain could unlock new therapies

    June 8, 2026
    Science

    Why GLP-1 drugs might reduce cancer risk

    June 8, 2026
    Science

    Landmark pancreatic cancer treatment paves way for targeting other tricky tumors

    June 8, 2026
    Editors Picks

    Ryan Reynolds’ Rep Responds To Leaked Text Messages

    January 28, 2026

    Greenland dispute deepens as US tariff threats trigger European backlash

    January 21, 2026

    The world is on track for between 1.9 and 3.7°C of warming by 2100

    January 26, 2025

    Arnold Schwarzenegger’s Son Joseph Baena Gears Set For Bodybuilding Debut

    March 26, 2026

    Commentary: Trump can’t sell the Abraham Accords to a Middle East that has lost trust in the US

    May 29, 2026
    About Us
    About Us

    Welcome to Benjamin Franklin Institute, your premier destination for insightful, engaging, and diverse Political News and Opinions.

    The Benjamin Franklin Institute supports free speech, the U.S. Constitution and political candidates and organizations that promote and protect both of these important features of the American Experiment.

    We are passionate about delivering high-quality, accurate, and engaging content that resonates with our readers. Sign up for our text alerts and email newsletter to stay informed.

    Latest Posts

    Singapore Airlines, Southwest Airlines partner to expand access to nearly 120 US destinations

    June 9, 2026

    Trump warns Netanyahu: ‘You’ll be on your own’ if attacks on Iran continue | US-Israel war on Iran News

    June 9, 2026

    Cristiano Ronaldo, ‘The Bosnian Diamond’ headline the World Cup 40-and-over club

    June 9, 2026

    Subscribe for Updates

    Stay informed by signing up for our free news alerts.

    Paid for by the Benjamin Franklin Institute. Not authorized by any candidate or candidate’s committee.
    • Privacy Policy
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.