Multitask machine learning and feature selection for biomarker discovery from big biological data.
Proteins and metabolites, the new class of biomarkers are expected to bring a paradigm shift in the diagnosis, monitoring and treatment of human disease and will make personalized medicine a reality in near future. Moreover, the next generation biomarkers are likely to be based on the inference drawn from multiple metabolite or protein molecules rather than single measurements such as the blood glucose level that is currently used for the diagnosis of diabetes. The present project focuses on the discovery of biomarkers from big biological data involving genomics, proteomics or metabolomics. Our primary objective will be to select the best combination of biomarkers or minimum subset of features to predict class labels such as disease vs. healthy or one category of disease vs. another. The challenges include: (i) small amount of labelled data, (ii) unbalanced data with too many features and too few samples, (iii) the need to learn to predict classes that may have only subtle differences, and (iv) high inherent biological variability (unrelated to the class label) and instrument noise. We will use a number of feature selection methods and machine learning tools together with the concepts of multitask learning to achieve the task of biomarker discovery. A large number of public domain databases are available that will be used as test cases. See a recently published paper to understand the broad objectives of the proposed project (https://www.frontiersin.org/articles/10.3389/fgene.2019.00452/full). You may also see literature on multitask machine learning. The co-supervisor is from the CSE department and an expert in machine learning while the supervisor is active in the fields of metabolomics, systems biology and big biological data analysis. The candidate must have a strong foundation in AI/ML apart from the willingness to work in cross-disciplinary areas.