计算机科学
模式
建筑
人工智能
自然语言处理
语音识别
历史
社会科学
社会学
考古
作者
Satwanti Kapoor,Shubham Gulati,Sumit Verma,Ananya Pandey,Dinesh Kumar Vishwakarma
标识
DOI:10.1109/incacct61598.2024.10551131
摘要
In the past, a lot of research has been done on text-driven sentiment analysis using benchmark multimodal Twitter-15 and Twitter-17 combined dataset. A small number of relevant studies have also shown the use of visual analysis to forecast sentiment in pictures. However, the majority of the research has only examined one modality of data—text, photos, or GIF videos. Lately, with photos, memes, and GIFs taking over social media feeds, typographic and info-graphic visual material has emerged as a significant component of social media. The suggestion is a multi-modal sentiment analysis model that may be used to ascertain the sentiment polarity and score of each incoming tweet, taking into the consideration of text as well as image features of the tweet. Text-based sentiment scoring is done using BERT, RoBERTa, XLNet. Vision-based sentiment scoring is done using ResNet-50, RegNet, ResNeXt. The study came up with a Multi-modal Sentiment Recognition model consisting of text and image models and combining the sentiment scores from the separately processed text and image. Using the Twitter-1517 benchmark multi-modal twitter dataset, the model with RegNet for visual scoring and RoBERTa for text scoring demonstrates a high performance accuracy of 77.24%. The study goes on to show that integrating text and picture characteristics performs better than independent models that only use text analysis or images.
科研通智能强力驱动
Strongly Powered by AbleSci AI