Google’s AI-powered medical chatbot, Med-PaLM, has achieved a passing grade on the challenging US Medical Licensing Examination (USMLE), according to a peer-reviewed study published on July 12. However, the study found that the chatbot’s responses still lack the accuracy of human doctors.
Last year, the release of ChatGPT, developed by OpenAI (backed by Google’s rival Microsoft), sparked a race among tech giants in the field of AI. While AI has generated excitement and concern for its future possibilities, the healthcare sector has already witnessed tangible progress with AI algorithms being able to interpret certain medical scans.
In December, Google introduced Med-PaLM, an AI tool designed to answer medical queries. Unlike ChatGPT, Med-PaLM has not been made available to the public. Google claims that Med-PaLM is the first large language model, an AI technique trained on extensive human-produced text, to pass the USMLE.
The passing grade for the USMLE, taken by medical students and physicians-in-training in the US, is approximately 60 percent. In February, a study revealed that ChatGPT achieved passing or near passing scores on the exam. In a peer-reviewed study published in the journal Nature, Google researchers reported that Med-PaLM scored 67.6% on USMLE-style multiple choice questions.
While the study described Med-PaLM’s performance as encouraging, it acknowledged that the chatbot still falls behind human clinicians. To address the issue of “hallucinations,” which refer to false information provided by AI models, Google developed a new evaluation benchmark. Karan Singhal, a Google researcher and lead author of the study, stated that the team has tested a newer version of the model using this benchmark, yielding “super exciting” results.
In a preprint study released in May (not yet peer-reviewed), Google claimed that Med-PaLM 2 achieved a score of 86.5% on the USMLE, surpassing the previous version by nearly 20%.
“There is an elephant in the room” when it comes to AI-powered medical chatbots, noted James Davenport, a computer scientist from the University of Bath who was not involved in the research. Davenport emphasized the distinction between answering medical questions and the comprehensive nature of medicine, including diagnosing and treating actual health conditions. Anthony Cohn, an AI expert from Leeds University, suggested that hallucinations could always be a challenge for large language models due to their statistical nature. Thus, these models should be viewed as assistants rather than ultimate decision-makers, according to Cohn.
Singhal highlighted the potential of Med-PaLM to support doctors by suggesting alternatives that may not have been considered otherwise. The Wall Street Journal reported that Med-PaLM 2 has been undergoing testing at the prestigious Mayo Clinic research hospital since April. However, Singhal declined to comment on specific partnerships, clarifying that any testing would be limited to administrative tasks without direct patient impact. – AFP Relaxnews
Credit: The Star : News Feed