Sentiment Analysis of Apple products using SVM and Maximum Entropy Model

94 Views - 0 Downloads

Preview

Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #0 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #1 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #2 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #3 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #4 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #5 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #6 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model #7

Excerpt

Sentiment Analysis of Apple products using SVM and Maximum Entropy Model* Extended Abstract† Mohammad Al Amin Ashik Ali Mohammad Tarif International Islamic University Malaysia International Islamic University Malaysia P.O. Box 53100 P.O. Box 53100 [email protected] [email protected] S M Raju Md. Shariful Islam International Islamic University Malaysia International Islamic University Malaysia P.O. Box 53100 P.O. Box 53100 [email protected] [email protected] Nowadays, Internet has become an online platform to exchange ideas and share opinions. Twitter is rapidly gaining popularity as it allows people to share and express their views about topics, or post messages all over the world. There are a lot of tweets regarding Apple products world-wide, as it is one of the leading technology companies in the world today. It is difficult for us to know how people feel about Apple on a major scale. It is not possible to manually extract the sentiment by considering each tweet. We proposed exclusive sentiment polarity detection approaches. SVM and Maximum Entropy can both be used to analyze sentiments. Our experiment will evaluate the algorithm using accuracy, precision, recall and F-measure as metrics. Thus, we can reach a conclusion is more effective for mining sentiments of tweets. From millions of online websites and social medias, millions of texts are generated basis on several issues and factors such as stories, discussions, decisions, blogs, feedback, twitter, postings etc. on daily basis. These data of texts have been reshaping corporations for analyzing their services, impacts of public sentiments, public emotions etc. These data are great opportunity to impact our social and political views and methods. But to analyses these vast datasets are not easy. Several researches and algorithms have been developed over these as sentiment analysis and opinion mining. For our project, our subject texts are collected from twitter. These tweets will be extracted as texts and filtered as plain texts, which will feed the process of sentiment analysis. During the process, the content as well as the texts is classified into several polarity measures as positive, negative and neutral. Here in this experiment, SVM and Maximum entropy algorithms will be used regarding the sentiment analysis to test their accuracy. ABSTRACT KEYWORDS Twitter, apple products, SVM, Maximum Entropy, sentiment, tweets. 1.0 INTRODUCTION 1.1 PROJECT OBJECTIVES 1. 1. To classify the twitter texts into positive, negative and neutral sentiments. To mark all the collected tweets with their polarity. 1 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model 2. To evaluate which algorithm does this classification with more accuracy. 1.2 PROJECT SCOPE Using twitter, we will mine the sentiments of people about Apple company around the world and analyse the sentiments to compare the two algorithms. 1.3 CHOSEN ALGORITHM Our chosen algorithms are Maximum Entropy Model and SVM, we would like to find out which algorithm is better at classification of sentiment. • SVM: Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a coordinate. Then, we perform classification by finding the hyperplane that differentiate the two classes very well (look at the below snapshot). 1.4 DATASET Data was extracted from twitter. A twitter developer account was created, and tweets were extracted with keys generated by twitter. We collected 6,214 real-time tweets around the world. Tweets with the following word, '@apple' were extracted from twitter. 1.5 SENTIMENT ANALYSIS Sentiment analysis (SA) is the system of extracting the polarity of individuals’ subjective opinions from plain normal language texts. Sentiment analysis involves classifying opinions in text into categories like "positive" or "negative" or "neutral". For example, a review on a website might be broadly positive about a digital camera but be specifically negative about how heavy it is. Being able to identify this kind of information in a systematic way gives the vendor a much clearer picture of public opinion than surveys or focus groups do, because the data is created by the customer. 1.6 WORK FLOW DIAGRAM Figure 1: Support Vectors are simply the coordinates of individual observation. Support Vector Machine is a frontier which best segregates the two classes (hyper-plane/ line). • 2 Maximum Entropy Model: The Maximum Entropy classifier is a probabilistic classifier which belongs to the class of exponential models. The MaxEnt is based on the Principle of Maximum Entropy and from all the models that fit the training data, selects the one which has the largest entropy. The Max Entropy classifier can be used to solve a large variety of text classification problems such as language detection, topic classification, sentiment analysis and more. Tweets cleaning flow: see section 3.5 Sentiment Analysis of Apple products using SVM and Maximum Entropy Model 2.0 LITERATURE REVIEW Pak and Paroubek (2010) [1] proposed a model to classify the tweets as objective, positive and negative. They created a twitter corpus by collecting tweets using Twitter API and automatically annotating those tweets using emoticons. Using that corpus, they developed a sentiment classifier based on the multinomial Naive Bayes method that uses features like N-gram and POStags. The training set they used was less efficient since it contains only tweets having emoticons. Parikh and Movassate (2009) [2] implemented two models, a Naive Bayes bigram model and a Maximum Entropy model to classify tweets. They found that the Naive Bayes classifiers worked much better than the Maximum Entropy model. Go and L.Huang (2009) [3] proposed a solution for sentiment analysis for twitter data by using distant supervision, in which their training data consisted of tweets with emoticons which served as noisy labels. They build models using Naive Bayes, MaxEnt and Support Vector Machines (SVM). Their feature space consisted of unigrams, bigrams and POS. Xia et al. [4] used an ensemble framework for Sentiment Classification which is obtained by combining various feature sets and classification techniques. In their work, they used two types of feature sets (Part-of-speech information and Word-relations) and three base classifiers (Naive Bayes, Maximum Entropy and Support Vector Machines). They applied ensemble approaches like fixed combination, weighted combination and Metaclassifier combination for sentiment classification and obtained better accuracy. Luo et. al. [5] highlighted the challenges and an efficient technique to mine opinions from Twitter tweets. Spam and wildly varying language makes opinion retrieval within Twitter challenging task. Analysis of Twitter information has been the focus of many contemporary researches within the domain of sentiment analysis. Level classification is most promising subject in sentiment evaluation document in Sentiment classification. Reference [6] showed that there is a correlation between sentiment measures computed utilizing phrase frequencies in tweets and both client self-assurance polls and political polls. Accordingly, they illustrated that inclination of public towards special entities might be examined through analysis of tweets. Reference [7] measured presidential efficiency over an exact time interval by way of extracting public sentiment from Twitter. For this motive they used the SentiStrength lexicon [8]. As already acknowledged, [9] adopts a suite of sentiment features as well as some non-sentiment facets to procedure and analyze a manually annotated data set of tw

 

Download document

Buy ($25.00)


Rating


Document type

Research Paper


Languages

English.


Categories

Artificial Intelligence, Computer Science and Applications.


Country

Malaysia.


Related documents

Mobile shopping behaviour

Text mining clustering

Java Assignement