Longitudinal reliability of Twitter sentiment for measuring mental health and well-being in a UK birth cohort

International Journal for Population Data Science(2023)

引用 0|浏览3
暂无评分
摘要
Introduction & BackgroundSocial media data is increasingly recognised as an important source of behavioural data. It can provide insights into patterns of life and how individuals and groups are feeling. However, many studies into social media’s relationship to mental health and well-being have suffered from poorly developed ground-truth data, which relies on assumed ground-truth labels and data from single timepoints. This means that the accuracy of models at future timepoints cannot be assessed. Collecting Twitter data from cohorts provides a solution to this issue, given the many years of high quality data that can be used as ground truth. Cohorts can also benefit from the higher-resolution data provided by social media that can supplement their traditional data collection methods. Objectives & ApproachWe used Twitter data that has been collected with consent from two generations of the Avon Longitudinal Study of Parents and Children (ALSPAC) (N=656). The data is linked to two surveys completed in April-May 2020 and May-July 2020 for validated outcome measures of anxiety, depression, and general well-being. Using the LIWC and VADER sentiment algorithms, the sentiment categories most highly associated with each outcome were used to develop a multiple regression model for each of anxiety, depression and general well-being using the first survey timepoint. Error from these models in predicting the second timepoint allowed us to assess how well different outcomes are predicted by demographic group. Relevance to Digital FootprintsDigital footprint data can complement traditional data sources to provide a more nuanced view of health inequalities. These data are typically less timely to collect than traditional data collection methods (census, survey) allowing a more reactive response to emergent issues such as the cost-of-living crisis. ResultsThis study illustrates how the collection of digital footprint data can be integrated into existing long-term studies which can be used to provide multiple points of ground-truth data. Conclusions & ImplicationsThis study has shown that the collection and integration of Twitter data into cohort studies is feasible, and that cohort data provides multiple ground-truth options. This time series data is important for assessing the potential feasibility of mental health inference from online behavioural data, which this study shows may vary across personal characteristics. In future research we plan to link subsequent surveys from ALSPAC to provide more ground truth time points and explore the temporal stability of predictions, and impacts of model drift on performance.
更多
查看译文
关键词
twitter sentiment,mental health,uk birth cohort,longitudinal reliability,well-being
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要