Imbalanced ensemble learning in determining Parkinson's disease using Keystroke dynamics

EXPERT SYSTEMS WITH APPLICATIONS(2023)

引用 7|浏览3
暂无评分
摘要
Purpose: The main objective of this study is to propose a Keystroke dynamics (KD) based Parkinson's disease (PD) indicator using both fixed-text and free-text typing habits on conventional keyboards and homogeneous ensemble learning with several fully balanced bootstrapped training sets for more accurate and robust eHealth applications. Furthermore, this study addresses the six key hypotheses related to our main objective of understanding the risks in this screening process that were previously unknown but important for further improvement. Methods: For a predetermined window length, the Colin Bannard and neuroQWERTY MIT-CSXPD datasets were used to extract a series of key hold times and the time gaps between two consecutive presses and releases as a feature set. A wide range of statistical tools were employed to create a machine learning-ready feature arrangement for continuously generated patterns. A homogeneous ensemble learning approach was then developed using bootstrapping and under-sampling while retaining any rare samples. This model was validated using fifteen fixed-text and free-text inputs obtained from early-stage PD patients, De-novo PD patients, and healthy controls. Results: In the leave-one-user-out cross-validation (LOUOCV) evaluation, the maximum observed area under curve (AUC) is 85.99% +/- 0.41 for fixed-text input. However, for free-text inputs, AUC is 78.3% +/- 0.86, with a sensitivity/specificity of 74.46%/82.13%. The AUC for detecting De-novo patients is 79.83% +/- 1.26, which is somewhat lower than the AUC for determining the early stage of the disease, which is 83.81% +/- 0.83. Conclusion: The proposed model is more robust, usable (covert way of data acquisition), fast, and has ease of integration into conventional desktops/laptops suitable for real-life eHealth that could help for better diagnosis, early detection in a home environment, future reference, and treatment or therapy management. However, the subject size, severity levels of the disease, typing duration, feature composition, error in typing, and machine learning (ML) method selection influence the performance of this model. Therefore, careful attention is necessary while designing PD indicators using the proposed approach.
更多
查看译文
关键词
Parkinson?s disease,Keystroke dynamics,Ensemble learning,Bootstrapping,eHealth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要