AI chatbots have been increasingly promoted as the future of healthcare, with some systems performing well on standardized medical exams and offering symptom-based advice to users. However, a new study published in Nature Medicine suggests that these tools are not only far from replacing physicians, but may also pose risks when used for personal medical guidance.
The research, led by teams from Oxford University, identified a significant gap in large language models (LLMs). While the systems demonstrated strong technical knowledge and performed well in structured medical assessments, they struggled when asked to assist users with real-world health concerns. According to the researchers, translating theoretical knowledge into safe and practical medical advice remains a major challenge.
Dr. Rebecca Payne, the lead medical practitioner involved in the study, stated that despite the growing enthusiasm around AI in healthcare, the technology is not ready to take on the responsibilities of a physician. She warned that relying on large language models for symptom analysis can be dangerous, as they may provide incorrect diagnoses or fail to recognize situations that require urgent medical attention.
Large-Scale Testing Reveals Key Weaknesses
The study involved 1,300 participants who used AI models developed by OpenAI, Meta, and Cohere. Participants were presented with medical scenarios created by doctors and asked the AI systems what steps should be taken to address the described conditions.
Researchers found that the AI-generated advice was no more reliable than traditional self-diagnosis methods, such as online searches or personal judgment. In many cases, users received a mix of accurate and misleading guidance, making it difficult to determine appropriate next steps. Another challenge was communication: participants often struggled to understand what information the AI required to generate accurate recommendations.
Dr. Payne emphasized that medical diagnosis involves more than recalling facts. She explained that effective care requires listening carefully, asking clarifying questions, probing for relevant symptoms, and guiding patients through a dynamic conversation. Patients frequently do not know which details are medically significant, meaning physicians must actively extract critical information. The study concluded that current LLMs are not yet capable of reliably managing this complex interaction with non-experts.
A Support Role, Not a Clinical One
While the researchers cautioned against using AI chatbots as medical advisors, they did not dismiss the technology entirely. Instead, they suggested that AI can play a supportive role in healthcare settings. Dr. Payne noted that LLMs are particularly useful for summarizing and organizing information. In clinical environments, they are already being used to transcribe consultations and convert them into specialist referral letters, patient information sheets, or medical records.
The team concluded that although AI has potential in healthcare, it is not currently fit to provide direct medical advice. They argue that improved evaluation frameworks and safety standards are necessary before such systems could be responsibly integrated into patient-facing roles. Their goal, they said, is not to reject AI in medicine but to ensure it develops in a way that prioritizes patient safety and clinical accuracy.