Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions
arxiv(2024)
摘要
Evidence suggests that different prompts lead large language models (LLMs) to
generate responses with varying quality. Yet, little is known about prompts'
effects on response quality in healthcare domains. In this exploratory study,
we address this gap, focusing on a specific healthcare domain: dementia
caregiving. We first developed an innovative prompt template with three
components: (1) system prompts (SPs) featuring 4 different roles; (2) an
initialization prompt; and (3) task prompts (TPs) specifying different levels
of details, totaling 12 prompt combinations. Next, we selected 3 social media
posts containing complicated, real-world questions about dementia caregivers'
challenges in 3 areas: memory loss and confusion, aggression, and driving. We
then entered these posts into GPT-4, with our 12 prompts, to generate 12
responses per post, totaling 36 responses. We compared the word count of the 36
responses to explore potential differences in response length. Two experienced
dementia care clinicians on our team assessed the response quality using a
rating scale with 5 quality indicators: factual, interpretation, application,
synthesis, and comprehensiveness (scoring range: 0-5; higher scores indicate
higher quality).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要