Weather Sentiment Analysis with Tweets: A Machine Learning Approach
120 likes | 234 Vues
Analyzing weather-related tweets for sentiment, time reference, and weather type using SVM and RIDGE regression. Data preprocessing involves normalization and tokenization. Results show RMSE values for different models.
Weather Sentiment Analysis with Tweets: A Machine Learning Approach
E N D
Presentation Transcript
Weather and Tweets UCML 2013 Members: Vinh Dang, Wai I Iong, Matthew Dudley, Jiyuan Li
Background • Analyzing tweets related to the weather • whether it has a positive, negative, or neutral sentiment. • whether the weather occurred in the past, present, or future. • and what kind of weather the tweet references.
The data • Training set: (http://www.kaggle.com/c/crowdflower-weather-twitter) • contains tweets, locations, and a confidence score for each of 24 possible labels. • about 78000 attributes
The data Labels: • s1 + s2 + s3 + s4 + s5 = 1 • w1 + w2 + w3 + w4 = 1 • k1 + k2 + … + k15 may be greater than 1wd
The data • Testing set: • contains the id, tweet, state and location • no “sentiment”, “when”, or “kind” labels • which is where we are heading to • about 42000 attributes
Data Preprocessing • Data “normalizing” • convert html code into character (Ex: > → >) • examples: • convert all the hyperlinks in testing set into “{link}” • examples: • Tokenizing For example: “What a bright sunny!” “[what, a, bright, sunny, !]” • SQLite (for storing data)
Methodology • Bags of Words • tf-idf • Approach: 1) Regression SVM (SVR) 2) Ridge Regression
Result • Our result: • SVR RMSE = 0.26149 • Ridge RMSE = 0.16997 • Others: • The winner: 0.14314 • Start line (all zeros): 0.31957
Result • A better approach (Testing data VS. Actual results) • Review of Labels
Reference • CrowdFlower (2013) “Partly Sunny with a Chance of Hashtags.”, Kaggle, Retrieved from http://www.kaggle.com/c/crowdflower-weather-twitter. • Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm • Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Question? The End