Predictive Modeling of Acute Graft-versus-Host-Disease Using Machine Learning on Immune Cell and Cytokine Profiles at Engraftment
biorxiv(2025)
All India Institute of Medical Sciences | Indraprastha Institute of Information Technology | All India Institute of medical Sciences
Abstract
Background: Acute Graft-versus-Host-Disease (aGvHD) is a major immune complication following allogenic hematopoietic stem cell transplantation (Allo-HSCT), initiated by conditioning regimen-associated tissue damage. It involves the complex interplay of immune cells and cytokines. Our study aims to leverage machine learning (ML) algorithms on the immune and cytokine profile of Allo-HSCT recipients to develop biomarker-based classification models to predict the onset of aGvHD at the time of engraftment. Materials and Methods: Seventy patients diagnosed with hematological disorders who had undergone Ist Allo-HSCT were recruited from All India Institute of Medical Sciences, New Delhi, India. Peripheral blood (PB) was collected from the patients at the time of engraftment, and the immune cell subtypes and cytokine profiles were analyzed using flow cytometry and ELISA respectively. The individual cell counts were then processed using basic ML models, including support vector classifier with RBF kernel, Decision Tree, and Random Forest, chosen for their mathematical simplicity and feature importance advantage of Decision Trees and Random Forests. Various data settings were utilized in the study: combined immune and cytokine counts, immune cell counts only, cytokine counts only, T-cell counts only, NK cell counts only, dendritic cell counts only, and B-cell counts only. These configurations were selected to investigate how different data sets impact the prediction of aGvHD before its onset. Results: At the engraftment flow cytometric analysis of reconstituted lymphocytes in patients who developed aGvHD revealed that there was a remarkable decrease in the ratio of CD4+/CD8+ T-cell and Tregs, with an increase in the cytotoxic regulatory NK-cell, dendritic cells, and B-cell. The levels of pro-inflammatory cytokines (IFN-γ, IL-1β, IP-10, TNF-α, IL-17α, IL-12p70, MIP-1α, MIP-1β, RANTES), and Th17- and Th1-cells were elevated with consequent decline of the levels of anti-inflammatory cytokine IL-10, IL-2, IL-4 and Th2-, Th9-cells. Machine learning based on 48 parameters [all immune cell subsets n=34 and all cytokines (n=14)]. The correlation heat map shows a higher correlation of aGvHD with the cytokine profile with or without immune cells (accuracy: 1), T-cell alone (accuracy: 0.96); NK-cell alone (accuracy: 0.93); dendritic cells alone (accuracy: 0.90), B-cell alone (accuracy: 0.86). Conclusion: The current models classify perfectly, indicating the potential for a ML algorithm in predicting the onset of aGvHD. However, a study with a larger sample size is required to validate these classification models and mitigate the risk of overfitting observed due to the consistently high performance. The study also highlights the potential of cytokine profiles as a viable alternative to T-cell counts, as evidenced by the correlation heat map and classifier models. These findings provide valuable insights into dataset requirements and future directions for integrating ML models into aGvHD prediction. ### Competing Interest Statement The authors have declared no competing interest.
MoreTranslated text
求助PDF
上传PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
GPU is busy, summary generation fails
Rerequest