A study by the BBC has found that a staggering nine out of ten responses from AI chatbots on current events contain errors, with issues ranging from precision problems to source attribution and bias. This raises significant concerns about the reliability of these tools in disseminating news and the potential impact on public trust.
Generative artificial intelligence has revolutionized the way we access information, but its accuracy remains a challenge. Models like ChatGPT, Gemini, and Co-pilot use vast amounts of data to generate responses based on learned patterns, but often mix facts with opinions, current information with outdated files, and make attribution mistakes. The BBC’s study analyzed the responses of four platforms – Google Gemini, Microsoft Co-pilot, OpenAI ChatGPT, and Perplexity – and found that more than half presented significant errors.
The study’s objective was to evaluate the accuracy of these tools and analyze how they cite news sources, including BBC articles. The BBC temporarily lifted its blockade on these chatbots accessing their content, but reinstated the restriction after the evaluation. The results showed that Gemini had the most serious errors, with over 60% of its answers containing significant mistakes, followed by Co-pilot, ChatGPT, and Perplexity, which had error rates of 40%.
Precision and Misinformation Errors
One of the most concerning findings was the inaccuracy of many answers. For instance, Gemini claimed that the UK’s National Health Service (NHS) does not recommend vaping as a method to quit smoking, when in fact, the NHS does recommend it as an alternative for those seeking to abandon tobacco. Another striking example was Co-pilot’s erroneous information about a victim of abuse in France, which presented incorrect details about how the woman discovered the crimes committed against her.
Errors in figures and dates were also detected. Co-pilot provided incorrect information about Google Chrome’s market share, wrongly stated the year of death of One Direction singer Liam Payne, and miscalculated the number of prisoners released under an early release program in the UK. Perplexity also incorrectly reported on the death of British presenter Michael Mosley, attributing statements to his wife that she never made.
Attribution Problems and Editorial Bias
The study also revealed that chatbots often cited incorrect sources or used outdated information. Gemini’s responses presented significant errors in source attribution 45% of the time, and in 26% of cases, it failed to include any reference. The chatbots also altered phrases taken from BBC articles or attributed them incorrectly in eight different cases. ChatGPT was the only model that did not present this problem.
Furthermore, the study found that chatbots tend to introduce opinions in their responses without specifying whether they come from a reliable source or are inferences generated by AI. Co-pilot and Gemini had editorialized bias in at least 10% of their answers, while Perplexity presented this problem in 7% of cases and ChatGPT in 3%. For example, Co-pilot stated that UK Prime Minister Keir Starmer had “a comprehensive plan to address the most urgent problems in the UK,” which could give the impression that the conclusion came from the BBC, when in reality it was an unattributed inference.
Regulation and Transparency: A Pending Challenge
The report concludes that companies developing generative AI models must improve their systems to guarantee more precise and reliable responses, especially on current events. The BBC urged these companies to collaborate with the media to rectify the problems detected and work on long-term solutions. Pete Archer, director of the BBC’s generative program, emphasized the need for regulations that ensure the veracity of these models’ responses and suggested the creation of an independent research institute to evaluate the accuracy of AI-generated information on news platforms.
Deborah Turness, the BBC’s news director, warned that AI tools are “playing with fire” and do not differentiate well between facts and opinions, mix current information with old files, and tend to generate biased answers. The result, she said, is a confusing cocktail that is far from the verified facts that consumers expect and deserve. This report reinforces the debate on the role of artificial intelligence in journalism and highlights the need to improve its precision to avoid the propagation of misinformation.