Applications of machine learning in analyzing higher-order mass spectrometry metabolomics data
Liquid chromatography coupled with mass spectrometry (LC-MS) is a very popular technique that can be used to simultaneously analyze thousands of proteins or metabolites in biological samples. The latest LC-MS instruments acquire higher dimension data that includes orthogonal properties such as chromatographic retention time (RT), mass to charge ratio (m/z), tandem MS data (MS2) and collision cross sectional area (CCS values). With such high dimensionality, data analysis becomes quite cumbersome. Further, the signals need to be ascribed to known compounds based on matches with the compound libraries. Further, experimentally determined values of MS2 and CCS are available only for a limited number of compounds compared to the available chemical space. Thus, with the limited labelled data, data-efficient machine learning tools need to be developed including quantitative structure property relationships (QSPR) in the domain of metabolomics. The following articles may be explored that describe similar problems in the domain of interest. https://www.nature.com/articles/s41467-020-18171-8 or https://www.nature.com/articles/s41467-021-21352-8 The candidate must have a strong foundation in AI/ML apart from the willingness to work in cross-disciplinary areas.