Multimodality
Multimodal Sarcasm Analysis
Given a Sarcastic post (image and text), we aim at converting it into the intended meaning of the user by producing a textual description explaining the sarcasm in the given sarcastic post
Most of the time, users employ sarcasm as a means of mocking or poking fun on social media. This, however, isn't easily apparent and becomes even more challenging if one only looks at the text. If we take into consideration the image associated with the text, the task becomes easier since it may provide us with additional context of the situation under focus. Thus, our aim is to utilize multiple modalities (i.e., text and image) to generate the corresponding non-sarcastic utterance of the sarcastic text. By leveraging both visual and textual data, we can improve our understanding and interpretation of sarcastic remarks, leading to more accurate non-sarcastic translations.
Multimodal Summarization
We are working to generate text summary of conference paper presentation videos using visual, acoustic and textual modalities
Summarizing textual content from a video or just the video content often leads to a one-sided summary, as it fails to extract the crucial parts where the instructor emphasizes more. Here, we combine the three modalities—text, video, and audio—to construct vocabulary, extract video features, and pinpoint important audio segments. By integrating these diverse feature sets, we aim to create a more robust and informative summary. In doing so, we capture a more comprehensive understanding of the content, ensuring that the key points highlighted by the instructor are included in the summary. This multimodal approach leverages the strengths of each modality to provide a richer and more accurate representation of the original material.
Multimodal Meme Analysis
Given an Internet Meme, we aim to understand the emotion and sentiment expressed by it, using the meme text and corresponding image
Information on social media comprises of various modalities such as textual, visual and audio. NLP and Computer Vision communities often leverage only one prominent modality in isolation to study social media. However, the computational processing of Internet memes needs a hybrid approach. The growing ubiquity of Internet memes on social media platforms such as Facebook, Instagram, and Twitter further suggests that we can not ignore such multimodal content anymore. The objective of this research is to automatic processing of Internet memes. We primarily focus on three subtasks: sentiment (positive, negative, and neutral) analysis of memes, overall emotion (humour, sarcasm, offensive, and motivational) classification of memes, and classifying intensity of meme emotion.