Creation of pollution sources database using NLP based Machine Learning approaches

Pollution source apportionment is an important activity where the measured pollutants are ascribed to various known/unknown pollution sources, such as vehicular emissions, biomass burning, emissions from specific industries, etc. In absence of a well defined repository of pollution sources, this is a difficult problem. While such repositories (or databases) are available in the developed countries, they are lacking in the Indian subcontinent region. The aim of the current project is to use machine learning based Natural language processing (NLP) to extract information about known pollution sources from existing literature, such as research papers, reports from various regulatory agencies etc. and create a database for the same. In particular, pre-trained NLP models such as SciBERT will be used for this purpose.

This project is a followup of an earlier project where lot of groundwork has already been laid.

UG Project Type

BTP

SLP

Name of Faculty

Mani Bhushan