140 likes | 285 Vues
Aspect Based Clustering for Turkish News. Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu. Outline. Introduction Motivation Aspect Based Clustering Modeling Aspects Aspect Extraction Framing Cycle-Aware Clustering User Interface & Demo Conclusion References. Introduction.
E N D
AspectBasedClusteringforTurkishNews Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu
Outline • Introduction • Motivation • Aspect Based Clustering • Modeling Aspects • Aspect Extraction • Framing Cycle-Aware Clustering • User Interface & Demo • Conclusion • References
Introduction • News are produced in multiple stages: • Gathering, writing, editing, etc. • Subjective opinion of producers, owners, advertisers – biased environment • Effort needed for a comprehensive and balanced understanding of a news event • A system that guides and encourages reader to read news from different perspectives
Motivation • Current systems provide limited presentation of news • Listing news arbitrarily or according to date • A system that helps users reach news from different viewpoints via a single portal • Capture the difference of aspects within articles reporting a common news story • Use of advanced computational techniques of information retrieval
KeywordExtraction • Aspect: keyword-weight pairs • Keywords are extracted from • Head, sub-head, lead • GATE (General Architecture for Text Engineering) • Person, organization, location • Event extraction (Zemberek) • Frequently used action words/phrases
Framing Cycle-Aware Clustering • Set of articles on a news shows head-tail characteristics • Head – common aspects • Tail – uncommon aspects • Separation of head and tail provides effective classification • Two steps: • Head-tail partitioning • Tail-side clustering
Head-TailPartitioning • Generate common-uncommon keyword sets • HgP: head group proportion • Calculate keyword commonness & uncommonness • Commonness – an article with many common keywords with high weight values • Uncommonness - an article with many uncommon keywords with high weight values
Tail-Side Clustering • Agglomerative hierarchical clustering • Similarity measure – Cosine similarity • During Agglomerative Clustering • Each object forms a cluster of its own as a singleton • Pairs of clusters are merged iteratively until a certain stopping criterion is met • In the merging process - the similarity between two clusters is measured by the similarity of the most similar pair of sequences belonging to these two clusters (the single-link approach)
UserInterface • Simple & user-friendly • Present news from different aspects fairly • Motivate reader to read news from different aspects
Conclusion • Existing systems: Google news, Yahoo News • Limited presentation • News listed arbitrarily • Proposed system: • Gathers same news with existing systems • Clusters news according to aspects • Simple user interface • Easy to track news stories • The approach is suitable for Turkish news
References [1] Park, S., Kang, S., Lee, S., Chung, S., Song, J. Mitigating Media Bias: A Computational Approach. ACM, 2008, pp. 47-51. [2] Park, S., Kang, S., Chung, S., Song, J. NewsCube: Delivering Multiple Aspects of News to Mitigate Media Bias. ACM, 2009. [3] Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. ACL'02, 2002. [4] Park, S., Lee, S., Song, J. Aspect-level News Browsing: Understanding News Events from Multiple Viewpoints. ACM, 2010, pp. 41-50.