Abstract
Social networking and microblogging sites act as an important source of information with the advancements of information and communication technology. People may express their views, complaints, feelings, and attitudes towards subjects. They can express their ideas about current issues, and products through microblogging platforms. Sentiment analysis is an important research direction in the natural language processing, which seeks to define the orientation of the source materials’ sentiment. Twitter is one of the most common microblogging sites in the world, in which millions of people publishing more than one hundred million text messages (referred as tweets) every single day. For short text messages, the identification of an appropriate term representation scheme is a crucial task. In the vector space model, term weighting schemes are important schemes to represent text documents. In this paper, we present a comprehensive analysis on sentiment analysis in Turkish with nine supervised and unsupervised term weighting schemes. Four supervised learning algorithms (i.e., Naïve Bayes, support vector machines, k-nearest neighbor algorithm and logistic regression) and three ensemble learning methods (i.e., AdaBoost, Bagging and Random Subspace) are used to explore the predictive efficiency of the term weighting schemes. The experimental results indicate that supervised term weighting models can outperform unsupervised term weighting models.