It appears that the first trial using generative AI to create a therapy robot was just as effective for those with anxiety or depression as if it were a human therapist. It doesn’t, however, give the go ahead to the many companies that are hyping these technologies in an unregulated gray area.
Therabot was developed by researchers at Dartmouth College’s Geisel School of Medicine and Dartmouth College. The results of the study were published in The Journal of Clinical Psychology on the 27th of March. New England Journal of Medicine. Many companies are developing AI-based tools to help with therapy. They claim that people will be able to talk more often and cheaper with an AI bot than with a trained professional, and that the approach is both safe and effective.
Psychologists and psychiatrists share this vision. They note that less than half of those with mental disorders receive therapy and even then, they may only get 45 minutes per day. Two things have held researchers back from developing technology that would allow more people to access therapy.
A therapy bot saying the wrong thing can cause real harm. Many researchers use explicit programming to build bots. The software uses a bank of pre-approved responses. This makes bots less interesting to talk with and they lose people’s interest. The second issue is that the hallmarks of good therapeutic relationships–shared goals and collaboration–are hard to replicate in software.
Researchers at Dartmouth believed that generative AI could help them overcome these obstacles in 2019, when early large-language models, like OpenAI GPT’s GPT, were beginning to take shape. The researchers at Dartmouth set out to build an AI model that would give responses based on evidence. First, they tried to build it using general conversations about mental health from Internet forums. They then turned to transcripts from thousands of hours with real psychotherapists.
In an interview, Michael Heinz said, “We heard a lot of “hmm-hmms,” “go ons,” and that your problems are due to your relationship with mother.” The words used were more like “traits” of psychotherapy than what the patient would want.
They were not satisfied and began to assemble their own data sets, based on best practices. This is how the final model was created. In contrast, many AI therapy bots available on the market are just minor variations of models such as Meta’s Llama that have been trained primarily on internet discussions. This is a serious problem for certain topics, such as disordered food consumption.
Heinz: “If you say you want to reduce weight, they will support you, even though you may have started out at a lower weight.” Human therapists wouldn’t.
Researchers conducted an 8-week trial to test Therabot with 210 participants. Participants had high risk of eating disorders or symptoms related to depression, generalized anxiety disorder and/or were suffering from gAD. A control group was not given access to Therabot. About half of the participants had Therabot. The participants responded to the AI’s prompts and started conversations. They averaged about 10 messages a day.
The best results were achieved by participants with depression, who saw a reduction of 51% in their symptoms. Anxiety sufferers saw their symptoms drop by 31%, while those who are at high risk of eating disorders experienced a reduction in body image concerns and weight of 19%. The measurements were based on the self-reporting of surveys. This method is not perfect, but it remains one that researchers can use.
Heinz claims that these results are similar to what is found in psychotherapy trials with human treatment lasting 16 hours, yet the Therabot test achieved it in half the time. Heinz says, “I have been in the digital therapeutics field for many years and I’ve not seen engagement levels that are sustained and prolonged at this level.”
Jean-Christophe Belisle-Pipon is an assistant professor at Simon Fraser University, who has published articles on AI therapy robots, but wasn’t involved in this research. He says that the results of the study are very impressive, but he also notes that, like all clinical trials, they don’t always reflect how the actual treatment will work.
He wrote an email that “we are still far away from the ‘greenlight” for a widespread deployment of clinical services.”
A wider deployment may require more supervision. Heinz says that at the start of the trial he watched all messages from participants who had consented for the arrangement to be sure to look out for any problematic responses. Therapy bots would not be able reach as many users if they needed to have this kind of oversight.
Heinz was asked if the findings validated the growing industry of AI treatment sites.
He says “quite the contrary” and warns that many don’t seem to be training their models in evidence-based practice like cognitive behavioral treatment, nor do they employ trained researchers to observe interactions. He adds, “I’m concerned about how quickly the industry is moving and not evaluating it.”
Heinz says that when AI sites claim to offer therapy within a clinically-valid context, they are under the Food and Drug Administration’s regulatory jurisdiction. The FDA hasn’t targeted many sites yet. Heinz believes that if the FDA did go after many of the sites, they would not be able to get “claim clearances”-that is a decision backing their claims regarding the benefits offered.
Belisle Pipon argues that the reach of these digital therapies will be severely limited if they aren’t approved, integrated and accepted by health care and insurance systems. People who could benefit more from them may instead seek out emotional bonding and therapy through AI that is not specifically designed to do so (in fact, new OpenAI research suggests interactions with their AI models can have an impact on well-being).
He wrote: “It’s highly probable that people will continue to use affordable chatbots, such as ChatGPT and Character.AI, for everyday tasks, from managing mental health to creating recipe ideas.”