When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
CoRR(2024)
摘要
Depression is a critical concern in global mental health, prompting extensive
research into AI-based detection methods. Among various AI technologies, Large
Language Models (LLMs) stand out for their versatility in mental healthcare
applications. However, their primary limitation arises from their exclusive
dependence on textual input, which constrains their overall capabilities.
Furthermore, the utilization of LLMs in identifying and analyzing depressive
states is still relatively untapped. In this paper, we present an innovative
approach to integrating acoustic speech information into the LLMs framework for
multimodal depression detection. We investigate an efficient method for
depression detection by integrating speech signals into LLMs utilizing Acoustic
Landmarks. By incorporating acoustic landmarks, which are specific to the
pronunciation of spoken words, our method adds critical dimensions to text
transcripts. This integration also provides insights into the unique speech
patterns of individuals, revealing the potential mental states of individuals.
Evaluations of the proposed approach on the DAIC-WOZ dataset reveal
state-of-the-art results when compared with existing Audio-Text baselines. In
addition, this approach is not only valuable for the detection of depression
but also represents a new perspective in enhancing the ability of LLMs to
comprehend and process speech signals.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要