1 / 14

Aspect Based Clustering for Turkish News

Aspect Based Clustering for Turkish News. Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu. Outline. Introduction Motivation Aspect Based Clustering Modeling Aspects Aspect Extraction Framing Cycle-Aware Clustering User Interface & Demo Conclusion References. Introduction.

matsu
Télécharger la présentation

Aspect Based Clustering for Turkish News

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AspectBasedClusteringforTurkishNews Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu

  2. Outline • Introduction • Motivation • Aspect Based Clustering • Modeling Aspects • Aspect Extraction • Framing Cycle-Aware Clustering • User Interface & Demo • Conclusion • References

  3. Introduction • News are produced in multiple stages: • Gathering, writing, editing, etc. • Subjective opinion of producers, owners, advertisers – biased environment • Effort needed for a comprehensive and balanced understanding of a news event • A system that guides and encourages reader to read news from different perspectives

  4. Motivation • Current systems provide limited presentation of news • Listing news arbitrarily or according to date • A system that helps users reach news from different viewpoints via a single portal • Capture the difference of aspects within articles reporting a common news story • Use of advanced computational techniques of information retrieval

  5. AspectExtraction

  6. KeywordExtraction • Aspect: keyword-weight pairs • Keywords are extracted from • Head, sub-head, lead • GATE (General Architecture for Text Engineering) • Person, organization, location • Event extraction (Zemberek) • Frequently used action words/phrases

  7. WeightCalculation

  8. Framing Cycle-Aware Clustering • Set of articles on a news shows head-tail characteristics • Head – common aspects • Tail – uncommon aspects • Separation of head and tail provides effective classification • Two steps: • Head-tail partitioning • Tail-side clustering

  9. Head-TailPartitioning • Generate common-uncommon keyword sets • HgP: head group proportion • Calculate keyword commonness & uncommonness • Commonness – an article with many common keywords with high weight values • Uncommonness - an article with many uncommon keywords with high weight values

  10. Tail-Side Clustering • Agglomerative hierarchical clustering • Similarity measure – Cosine similarity • During Agglomerative Clustering • Each object forms a cluster of its own as a singleton • Pairs of clusters are merged iteratively until a certain stopping criterion is met • In the merging process - the similarity between two clusters is measured by the similarity of the most similar pair of sequences belonging to these two clusters (the single-link approach)

  11. UserInterface • Simple & user-friendly • Present news from different aspects fairly • Motivate reader to read news from different aspects

  12. Conclusion • Existing systems: Google news, Yahoo News • Limited presentation • News listed arbitrarily • Proposed system: • Gathers same news with existing systems • Clusters news according to aspects • Simple user interface • Easy to track news stories • The approach is suitable for Turkish news

  13. References [1] Park, S., Kang, S., Lee, S., Chung, S., Song, J. Mitigating Media Bias: A Computational Approach. ACM, 2008, pp. 47-51. [2] Park, S., Kang, S., Chung, S., Song, J. NewsCube: Delivering Multiple Aspects of News to Mitigate Media Bias. ACM, 2009. [3] Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. ACL'02, 2002. [4] Park, S., Lee, S., Song, J. Aspect-level News Browsing: Understanding News Events from Multiple Viewpoints. ACM, 2010, pp. 41-50.

  14. Thankyouforlistening…

More Related