Assessment of the completeness, accuracy, and scalability of commercial abstraction for a large prostate cancer biospecimen repository

CANCER RESEARCH(2023)

引用 0|浏览13
暂无评分
摘要
Abstract Background: Large repositories of biospecimens collected from patients at cancer centers can provide a valuable resource in the development and validation of new biomarker assays to guide therapeutic decision-making. In order to utilize such repositories for biomarker studies, biospecimens must be annotated with the clinical context of each sample. The main source of clinical data is typically an unstructured electronic medical record, which can require a significant amount of time and resources to manually curate. Methods: We developed a database comprised of disease-specific clinical data elements for a large repository of prostate cancer blood samples collected between 2006 – 2022 at a comprehensive cancer center. To provide clinical context for these samples, we contracted and trained a data abstraction company on entry practices with strict adherence to our standardized data dictionary and source hierarchy. After data abstraction and review of an initial training set, we performed a formal evaluation of data quality (completeness and accuracy) through a 100-patient blinded comparison to gold-standard abstraction by a medical oncologist using all available data sources. Subsequently, data entry was completed for an additional 2500 patients and included in longitudinal analysis. Results: Comparison to medical oncologist reference determined that the commercial annotations demonstrated similar completeness for most data elements. For some elements such as stage at diagnosis (M1 vs. M0), commercial abstraction achieved lower completeness (80%) than a medical oncologist (100%). Overall, the accuracy of the commercial annotations varied by element but was suitable for the purpose of identifying samples for use in context-specific biomarker studies. Data regarding disease-related events showed low median variance in the timing of first metastasis (0 months) and castration-resistance (-2.1 months), with substantial observed variance trending towards earlier event calling. Longitudinal analysis of 2500 abstractions showed relatively stable completeness in staging data over time, suggesting that missing data is at least partially attributable to imposed restrictions in data source hierarchy rather than inexperience. Targeted retraining mid-way after 1300 annotations considerably increased the speed of data entry without noticeable changes in data completeness. Conclusions: Commercial data abstraction can be effectively utilized to perform clinical data annotation for large biospecimen repositories with acceptable levels of completeness and accuracy. With appropriate training and direct oversight by an experienced on-site research team, this represents a scalable method for extracting valuable clinical data from largely unstructured patient medical records. Citation Format: Emily A. Carbone, Ethan S. Barnett, Niamh M. Keegan, Samantha E. Vasselman, Barbara Nweji, Ria N. Gajar, Karen A. Autio, Wassim Abida, Howard I. Scher, Konrad H. Stopsack. Assessment of the completeness, accuracy, and scalability of commercial abstraction for a large prostate cancer biospecimen repository [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 941.
更多
查看译文
关键词
large prostate cancer biospecimen,prostate cancer,commercial abstraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要