Code-mixed & Low Resource Language Processing

Code-Mixed Machine Translation

Building machine translation(MT) model for translating code-mix data. Exploring different machine translation models. Training semisupervised NMT models on code-mixed indian languages

Humans have the unique ability to seamlessly switch between multiple languages within a single sentence, demonstrating a high level of linguistic flexibility. However, the current state-of-the-art (SOTA) translation systems struggle to translate these mixed-language sentences effectively. This gap in capability highlights the need for improvement in translation technology. We intend to propose Neural Machine Translation (NMT) models designed to accurately interpret and provide meaningful translations of code-mixed sentences, bridging this gap and enhancing multilingual communication.

Opinion Detection in News

We aim to build a model that detects opinions in news delivery by anchors for code-mix Indian new debates

Humans like to express their opinions and crave the opinions of others. Mining and detection of opinions from a variety of sources are beneficial to individuals, organizations, and even governments. One such organization is Media, where a general norm is not to showcase opinions from their side. As anchors are the face of media, it is required for them not to be opinionated. Consequently, ensuring objective news delivery is crucial for maintaining public trust. Therefore, we are building upon a model which could detect opinions in news delivery. This model aims to assist anchors and media outlets in reflecting upon their work and improving their delivery. By enhancing the ability to identify and mitigate subjective content, media organizations can maintain impartiality and credibility in their broadcasts.

Fake News Detection in Hindi 

We are creating our own Fake news Detection Dataset in Hindi and fine-tune various machine learning and deep learning models specifically for Fake News Detection in Hindi language.

Majority of the fake news spread now a days in India is in regional languages. There are already a lot of fake news detection models out there in the world but majority of them are for English Language only. There is hardly any work done on Fake news detection for low resourse language like Hindi, Marathi etc. For our project we are manually creating a Fake news/misinformation detection dataset in Hindi language. We will create many fake news detection models specifically for Devanagri script, which could be used by other researchers as baselines and help in improving the task of fake news detection in low resource languages