Systems Biology in ELIXIR: modelling in the spotlight [version 2; peer review: 1 approved, 2 approved with reservations]
ELIXIR Hub | Department of Biotechnology and Systems Biology | Department of Bioinformatics - BiGCaT | Department of Computer Science | BIO3 - Laboratory for Systems Medicine | Faculty of Medicine | ISBE.NL | European Molecular Biology Laboratory | Laboratory of Systems and Synthetic Biology | SIB Swiss Institute of Bioinformatics | Division of Infection and Immunity | Centre of Biological Engineering | Competence Center for Methodology and Statistics; Transversal Translational Medicine | Research and Platforms Department | Faculty of Informatics | Systems Biology Ireland | Maastricht Centre for Systems Biology (MaCSBio) | National Bioinformatics Infrastructure Sweden (NBIS) | Laboratory of Bioprocess Engineering | Department of Biology and Biological Engineering | Luxembourg Centre for Systems Biomedicine (LCSB) | Heidelberg Institute for Theoretical Studies - HITS | Leiden Institute of Advanced Computer Science | Department of Bioinformatics | Metabolic Engineering & Systems Biology Laboratory | Scientific Network Management SL
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
