Natural language processing is an emerging, complex form of analysis. But a study from Indonesia showed computers can draw sense from doctors’ chats.
An Indonesian analysis of doctors’ chat records in online health forums has revealed that some kinds of artificial intelligence tools are better than others in deriving meaning. The findings could assist researchers around the world to process complex text and draw conclusions.
One of the rapidly growing health services today is telemedicine, after a surge precipitated by the COVID-19 pandemic. Telemedicine has proven to be invaluable in diverting patients from long waits in the emergency room and has changed the practices of thousands of healthcare providers. One type of telemedicine is online health consulting, a text-based chat appointment between patient and doctor. Such chats take place on social media, health websites or mobile applications. Patients send images and/or questions and doctors assess the severity of the disease and make an early diagnosis.
However, whether doctors are providing appropriate precautionary advice, especially for high-risk diseases, is unknown. Artificial intelligence and natural language processing were employed to explore doctors’ messaging from 2014-2021. Natural language processing refers to the use of AI to help computers understand text and spoken language with comparable complexity to humans.
Topic modelling is an approach where computers scan a body of text, find word and phrase patterns within them, and automatically create clusters of related expressions. Topic modelling algorithms can be applied to many different kinds of text, and hold promise for summarising and understanding the ever-evolving archive of digital information.
The study in Indonesia used one of the most common techniques for modelling topics named Latent Dirichlet Allocation (LDA). This assigns topics based on the probability that a word or section of text has similarities. For example, the computer will not know that dog, pooch and puppy are related words in English, but given the frequency of their occurrence in similar settings, it will assume they are somehow related and assign them to the same topic. For comparison, the same test was run using a hybrid LDA, combining the LDA method and the inference engine. The inference engine helps solve the LDA algorithm’s accuracy problems by estimating if words are related to two topics.
The dataset was sourced from three online health consultation sites: www.sehatq.com, www.klikdokter.com, and www.alodokter.com, with 18,737 entries. All data was cleaned up following previous work on Indonesian text.
The results showed that topic modelling using Hybrid LDA was better than the LDA model. The number of times a word was assigned to the topic also provided insight. For example, “drink” might be mentioned in a doctor’s answer to a question about kidney disease. However, topic modelling with hybrid LDA might assign “drink” to diarrheal disease because it occurs there more frequently. Analysis can also be tripped up by the same words being used in both precautionary advice and symptom descriptions.
The hybrid LDA grouped the text into four domains: symptoms/diagnosis, treatment, precautionary measures and general text, which contained words unrelated to disease. Medical experts then checked the accuracy of the AI. For example for kidney disease, symptoms/diagnosis words included: kidney, stones, large, channel, urinary, failure, function, infection, urine, pain, waist, urinate, liquid, sick, chronic, disorder, cysts, and veins. Treatment words involved: discard, bladder, water, USG, measure, wash. Precautionary words included: measure, condition, channel, and urinary.
The work showed that hybrid LDA is the better tool for this kind of analysis and that it made sensible conclusions about groupings for doctors’ terms. The analysis also managed to find precautionary terms that were relevant to the disease among the doctors’ responses. The results may inform other kinds of natural language processing tests elsewhere in the world.
Safitri Juanita is a doctoral candidate at the Department of Electrical Engineering, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia and a senior lecturer at the Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia.
Mauridhi Hery Purnomo is a senior researcher at the Department of Computer Engineering, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. He is the Chair of the Laboratory of Multimedia Computing and Machine Intelligence.
Diana Purwitasari is an associate professor from Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia.