Close Menu
    Trending
    • Katie Holmes And Joshua Jackson Spark ‘Soul-Level’ Love Chatter
    • Singapore Airlines, Southwest Airlines partner to expand access to nearly 120 US destinations
    • Trump warns Netanyahu: ‘You’ll be on your own’ if attacks on Iran continue | US-Israel war on Iran News
    • Cristiano Ronaldo, ‘The Bosnian Diamond’ headline the World Cup 40-and-over club
    • How housing market inventory is shifting across every state
    • What is a ‘normal’ memory slowdown, and when should I worry?
    • Ariana Grande And Ethan Slater Are ‘Still Friends’ Following Split
    • US says BYD, Baidu, Alibaba and other tech giants are aiding China’s military
    Benjamin Franklin Institute
    Tuesday, June 9
    • Home
    • Politics
    • Business
    • Science
    • Technology
    • Arts & Entertainment
    • International
    Benjamin Franklin Institute
    Home»Science»All major AI models risk encouraging dangerous science experiments
    Science

    All major AI models risk encouraging dangerous science experiments

    Team_Benjamin Franklin InstituteBy Team_Benjamin Franklin InstituteJanuary 15, 2026No Comments5 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    Share
    Facebook Twitter Pinterest Email Copy Link


    Scientific laboratories can be dangerous places

    PeopleImages/Shutterstock

    The use of AI models in scientific laboratories risks enabling dangerous experiments that could cause fires or explosions, researchers have warned. Such models offer a convincing illusion of understanding but are susceptible to missing basic and vital safety precautions. In tests of 19 cutting-edge AI models, every single one made potentially deadly mistakes.

    Serious accidents in university labs are rare but certainly not unheard of. In 1997, chemist Karen Wetterhahn was killed by dimethylmercury that seeped through her protective gloves; in 2016, an explosion cost one researcher her arm; and in 2014, a scientist was partially blinded.

    Now, AI models are being pressed into service in a variety of industries and fields, including research laboratories where they can be used to design experiments and procedures. AI models designed for niche tasks have been used successfully in a number of scientific fields, such as biology, meteorology and mathematics. But large general-purpose models are prone to making things up and answering questions even when they have no access to data necessary to form a correct response. This can be a nuisance if researching holiday destinations or recipes, but potentially fatal if designing a chemistry experiment.

    To investigate the risks, Xiangliang Zhang at the University of Notre Dame in Indiana and her colleagues created a test called LabSafety Bench that can measure whether an AI model identifies potential hazards and harmful consequences. It includes 765 multiple-choice questions and 404 pictorial laboratory scenarios that may include safety problems.

    In multiple-choice tests, some AI models, such as Vicuna, scored almost as low as would be seen with random guesses, while GPT-4o reached as high as 86.55 per cent accuracy and DeepSeek-R1 as high as 84.49 per cent accuracy. When tested with images, some models, such as InstructBlip-7B, scored below 30 per cent accuracy. The team tested 19 cutting-edge large language models (LLMs) and vision language models on LabSafety Bench and found that none scored more than 70 per cent accuracy overall.

    Zhang is optimistic about the future of AI in science, even in so-called self-driving laboratories where robots work alone, but says models are not yet ready to design experiments. “Now? In a lab? I don’t think so. They were very often trained for general-purpose tasks: rewriting an email, polishing some paper or summarising a paper. They do very well for these kinds of tasks. [But] they don’t have the domain knowledge about these [laboratory] hazards.”

    “We welcome research that helps make AI in science safe and reliable, especially in high-stakes laboratory settings,” says an OpenAI spokesperson, pointing out that the researchers did not test its leading model. “GPT-5.2 is our most capable science model to date, with significantly stronger reasoning, planning, and error-detection than the model discussed in this paper to better support researchers. It’s designed to accelerate scientific work while humans and existing safety systems remain responsible for safety-critical decisions.”

    Google, DeepSeek, Meta, Mistral and Anthropic did not respond to a request for comment.

    Allan Tucker at Brunel University of London says AI models can be invaluable when used to assist humans in designing novel experiments, but that there are risks and humans must remain in the loop. “The behaviour of these [LLMs] are certainly not well understood in any typical scientific sense,” he says. “I think that the new class of LLMs that mimic language – and not much else – are clearly being used in inappropriate settings because people trust them too much. There is already evidence that humans start to sit back and switch off, letting AI do the hard work but without proper scrutiny.”

    Craig Merlic at the University of California, Los Angeles, says he has run a simple test in recent years, asking AI models what to do if you spill sulphuric acid on yourself. The correct answer is to rinse with water, but Merlic says he has found AIs always warn against this, incorrectly adopting unrelated advice about not adding water to acid in experiments because of heat build-up. However, he says, in recent months models have begun to give the correct answer.

    Merlic says that instilling good safety practices in universities is vital, because there is a constant stream of new students with little experience. But he’s less pessimistic about the place of AI in designing experiments than other researchers.

    “Is it worse than humans? It’s one thing to criticise all these large language models, but they haven’t tested it against a representative group of humans,” says Merlic. “There are humans that are very careful and there are humans that are not. It’s possible that large language models are going to be better than some percentage of beginning graduates, or even experienced researchers. Another factor is that the large language models are improving every month, so the numbers within this paper are probably going to be completely invalid in another six months.”

    Topics:



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link

    Related Posts

    Science

    What is a ‘normal’ memory slowdown, and when should I worry?

    June 9, 2026
    Science

    Wildlife thrives in solar farm built on restored peatland

    June 8, 2026
    Science

    You don’t need to worry about recursive-self-improving AI – yet

    June 8, 2026
    Science

    Understanding anorexia’s grip on the brain could unlock new therapies

    June 8, 2026
    Science

    Why GLP-1 drugs might reduce cancer risk

    June 8, 2026
    Science

    Landmark pancreatic cancer treatment paves way for targeting other tricky tumors

    June 8, 2026
    Editors Picks

    Market Talk – May 26, 2026

    May 26, 2026

    Homebuilder lot supply jumps so fast that 2 housing markets are now ‘significantly oversupplied’

    February 14, 2026

    Two Antisemitic Attacks in Two Days in the Netherlands Shake Residents

    March 15, 2026

    Trump insists China trip will go on as planned; analysts say leaders’ summit crucial to manage risks

    May 3, 2026

    Parcells reacts to Belichick, Kraft being snubbed by HOF

    February 7, 2026
    About Us
    About Us

    Welcome to Benjamin Franklin Institute, your premier destination for insightful, engaging, and diverse Political News and Opinions.

    The Benjamin Franklin Institute supports free speech, the U.S. Constitution and political candidates and organizations that promote and protect both of these important features of the American Experiment.

    We are passionate about delivering high-quality, accurate, and engaging content that resonates with our readers. Sign up for our text alerts and email newsletter to stay informed.

    Latest Posts

    Katie Holmes And Joshua Jackson Spark ‘Soul-Level’ Love Chatter

    June 9, 2026

    Singapore Airlines, Southwest Airlines partner to expand access to nearly 120 US destinations

    June 9, 2026

    Trump warns Netanyahu: ‘You’ll be on your own’ if attacks on Iran continue | US-Israel war on Iran News

    June 9, 2026

    Subscribe for Updates

    Stay informed by signing up for our free news alerts.

    Paid for by the Benjamin Franklin Institute. Not authorized by any candidate or candidate’s committee.
    • Privacy Policy
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.