Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums
arxiv(2024)
摘要
Mental health in children and adolescents has been steadily deteriorating
over the past few years. The recent advent of Large Language Models (LLMs)
offers much hope for cost and time efficient scaling of monitoring and
intervention, yet despite specifically prevalent issues such as school bullying
and eating disorders, previous studies on have not investigated performance in
this domain or for open information extraction where the set of answers is not
predetermined. We create a new dataset of Reddit posts from adolescents aged
12-19 annotated by expert psychiatrists for the following categories: TRAUMA,
PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert
labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In
addition, we create two synthetic datasets to assess whether LLMs perform
better when annotating data as they generate it. We find GPT4 to be on par with
human inter-annotator agreement and performance on synthetic data to be
substantially higher, however we find the model still occasionally errs on
issues of negation and factuality and higher performance on synthetic data is
driven by greater complexity of real data rather than inherent advantage.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要