1 / 21

Exploring in the Weblog Space by Detecting Informative and Affective Articles

Exploring in the Weblog Space by Detecting Informative and Affective Articles. Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University Qiang Yang Hong Kong University of Science and Technology WWW2007. Introduction. Unique characteristics of blogs

Télécharger la présentation

Exploring in the Weblog Space by Detecting Informative and Affective Articles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University Qiang Yang Hong Kong University of Science and Technology WWW2007

  2. Introduction • Unique characteristics of blogs • Mainly maintained by individual persons and thus the contents are generally personal • The link structures between blogs generally form localized communities • Ongoing research on blogs • Content based analysis • Blog communities’ evolution • Different kinds of tools to help users retrieve, organize and analyze the blogs

  3. Introduction – Genres in Blog’s Content • Affective • The online diary by which people share their daily life publicly, express their feelings or thoughts or emotions through the blogs • Informative • Topic-oriented; the topic can be related to a hobby or the author’s profession or business

  4. Introduction – the Problem and the Approach • The problem • Separating informative articles from affective articles in blogs. • The approach • Considering the problem as binary classification • Challenges • The definitions of the informative articles and the affective articles • The training corpus for both categories • The machine learning algorithm

  5. Introduction – Studies in the Weblog Space • Emotion and topic classification of blog articles • To improve the effectiveness of emotion classification through filtering out informative articles • Blog search • An intent-driven blog-search engine is proposed to resort the search results by considering their score of informative values. • Automatic detection of high-quality blogs • To measure the quality of a blog by calculating the percentage of informative articles

  6. Definition of Informative and Affective Articles • A survey is done among the users who usually participate in the activities in blogs • Contents of informative articles include: • News that is similar to the news on traditional news websites • Technical descriptions, e.g. programming techniques • Commonsense knowledge • Objective comments on the events in the world • Contents of affective articles include: • Diaries about personal affairs • Self-feelings or self-emotions descriptions

  7. Algorithms • Classification algorithms • Naïve Bayes Classifier (NB) • Support Vector Machine (SVM) • Rocchio Classifier • Feature selection algorithms • Information Gain (IG) • χ2 statistic (CHI)

  8. Classification Algorithm – Naïve Bayes Classifier • Laplace smoothing is applied to overcome the zero-frequency problem

  9. Classification Algorithm – Rocchio Classifier • Category profile based classifier where |cj| is the number of documents in the category cj and denotes document with terms weighted by TF-IDF

  10. Feature Selection Algorithms • Information Gain (IG) • χ2 statistic (CHI)

  11. Experiment Data • 5000 articles crawled from MSN space • 3,547 of them are labeled as affective and 1,109 are labeled as informative while the others are filtered because of the encoding problem • 2,200 articles from Sohu.com Directory as informative articles • News, commonsense knowledge or objective comments about 22 different topics Table 1. Statistics of Data Set

  12. Experiment – Comparing Classification Algorithms Table 2. Performances of three classification algorithms

  13. Comparing Feature Selection Algorithms Table 3. Performances on different features set

  14. Representative Features Table 4. Top 20 representative features of each category

  15. Study on Emotion and Topic Classification • Assume that informative articles do not express personal emotions • Extracting affective articles can help to build a corpus with pure emotional articles Figure 1. Two-step approach for topic and emotion classification

  16. Experiment on Emotion Classification • Data • Training: 2,494 blog articles are manually labeled into two emotion tendencies, positive and negative • Testing: 1,303 articles from 75 blogs in MSN Space Table 5. Data set used for emotion classification

  17. Experiment Result on Emotion Classification • Before the binary emotion classifier, the information-affectiveness classification is used (I-Approach) or not (II-Approach) Table 6. Comparison results for two emotion classification approaches

  18. Study on Intent-driven Weblog Search Engine • Blog search is at the state of Web search currently • Intent-driven search (re-rank) Smixed = λ.Sif+ (1-|λ|).Sorigin where Sif is a confidence value between -1 (strong affective intent) and 1 (strong informative intent), and Sorigin is the original relevance score

  19. Analysis for the Distribution of Two Genres of Articles Figure 2. Distribution of informative articles and affective articles on 99,059 blog articles

  20. Detecting High-quality Blogs Figure 3. Distribution of blogs with different levels of quality on 6,319 blogs

  21. Conclusion and Future Work • The task of separating informative and affective articles is addressed and considered as a binary classification task. • The applications of above information-affectiveness classification are studied, including emotion classification, intent-driven blog search and high-quality blogs detection. • Future work: 1) building a much large data set by using semi-supervised learning techniques 2) applying the existing approach on the data in other languages

More Related