Integration of Diverse Transcriptomics Datasets using Random Forest to Predict Universal Functional Pathways in Tfr Cells

biorxiv(2021)

引用 0|浏览3
暂无评分
摘要
Motivation T follicular regulatory (Tfr) cells are a specialized cell subset that controls humoral immunity. Despite a number of individual transcriptomic studies on these cells, core functional pathways have been difficult to uncover due to the substantial transcriptional overlap of these cells with other effector cell types, as well as transcriptional changes occurring due to disease settings. Developing a core transcriptional module for Tfr cells that integrates multiple cell type comparisons as well as diverse disease settings will allow a more accurate prediction of functional pathways. Researchers studying allergic reactions, immune responses to vaccines, autoimmunity and cancer could use this gene set to better understand the roles of Tfr cells in controlling disease progression. Additional cell types beyond Tfr cells that have similar features of transcriptomic complexity within diverse disease settings may also be studied using similar approaches. High-throughput sequencing technologies allow the generation of large datasets that require specific tools to best interpret the data. The development of a core transcriptional module for Tfr cells will allow investigators to determine if Tfr cells may have functional roles within their biological systems with little knowledge of Tfr biology. With this work, we have addressed the need of core gene modules to define specific subsets of immune cells. Results We introduce an integrated “core Tfr cell gene module” that can be incorporated into GSEA analysis using various input sizes. The integrated core Tfr gene module was built using transcriptomic studies in Tfr cells from several different tissues, disease settings, and cell type comparisons. Random forest was used to integrate the transcriptomic studies to generate the core gene module. A GSEA gene set was formulated from the integrated core Tfr gene module for incorporation into end-user friendly GSEA. The gene sets are presented along with random genes taken from the GTEX data set and are presented as GMT files. The user can upload the gene set to the GSEA website or any gene set tool which takes GMT files. We also present the full results of the model including p-values calculated by random forest. This provides users with more flexibility in choosing a p-value cutoff that is most appropriate for the experimental setting. Availability The core Tfr gene sets are freely available at: . We have also included all of the code and data used in developing these gene sets. The code and results are released under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要