AbstractSoftware sentiment analysis has applications in numerous software engineering tasks ranging from code suggestions to evaluating app reviews which help to save the development team valuable time and increase productivity. In recent years, sentiment analysis has been used to study the emotional state of developers through sources like commit messages. State-of-the-art sentiment analysis techniques have been employed to accomplish these tasks with varying results. The goal of this paper is to provide a comparison between the performance of various models for possible applications of sentiment analysis in software engineering. We have used three different datasets to account for the possible applications: JIRA, AppReviews, and StackOverflow. In this work, six word embedding techniques have been applied on above datasets to represent the text as n-dimensional vectors. To handle the skewed distribution of classes present in the data, we have employed two class balancing techniques in the form of SMOTE and Borderline-SMOTE. The resulting data is subjected to six feature selection techniques, and finally, the sentiment of the text is classified using 14 different classifiers. The experimental results suggest that some models are very successful in accurately classifying the sentiment of the text, whereas choosing the wrong combination of ML techniques can lead to disappointing performance.KeywordsSentiment analysisWord embeddingSMOTE