Generalizable and Automated Classification of TNM Stage from Pathology Reports with External Validation.

Jenna Kefeli,Nicholas Tatonetti

medRxiv : the preprint server for health sciences(2023)

引用 0|浏览4
暂无评分
摘要
Cancer staging is an essential clinical attribute informing patient prognosis and clinical trial eligibility. However, it is not routinely recorded in structured electronic health records. Here, we present a generalizable method for the automated classification of TNM stage directly from pathology report text. We train a BERT-based model using publicly available pathology reports across approximately 7,000 patients and 23 cancer types. We explore the use of different model types, with differing input sizes, parameters, and model architectures. Our final model goes beyond term-extraction, inferring TNM stage from context when it is not included in the report text explicitly. As external validation, we test our model on almost 8,000 pathology reports from Columbia University Medical Center, finding that our trained model achieved an AU-ROC of 0.815-0.942. This suggests that our model can be applied broadly to other institutions without additional institution-specific fine-tuning.
更多
查看译文
关键词
tnm stage,pathology reports,automated classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要