1 / 1

Memeta: A Framework for Analytics on the Blogosphere

Memeta: A Framework for Analytics on the Blogosphere. Pranam Kolari, Tim Finin. What is memeta? Our framework that puts research into real world use Features blog identification and splog detection modules

Télécharger la présentation

Memeta: A Framework for Analytics on the Blogosphere

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memeta: A Framework for Analytics on the Blogosphere Pranam Kolari, Tim Finin What is memeta? • Our framework that puts research into real world use • Features blog identification and splog detection modules • Includes Language Identification Modules, for more than 10 languages (provided by James Mayfield) • memeta has been used on a need-to basis to analyze the blogosphere Blogosphere Analytics Language Identifier Blog Directories Blog Identifier (98% Accuracy) Ping Servers 1 2 Splog Detector (87% Accuracy) Search Engines + BLOGS Blog Crawler Nature of pinging URLs at weblogs.com Host Distribution of Pings at weblogs.com 3 1. Welcome to the Splogosphere: 75% of pings are spings (splogs) • Monitored a ping server – weblogs.com over a period of 3 weeks from 20 Nov 2005 to 11 Dec 2005 • Total of 16 million update pings • See 1 for ping distribution of URLs • Pings were first classified into languages • Blogs from Italian followed a predictable pattern – higher during the day • Blogs from the English languages follows a similar pattern – not as obvious as Italian • Splogs followed no pattern and number of pings were three times of authentic English blogs (2, 3) Ping time-series of Italian blogs on a single day Ping time-series of Authentic blogs on a single day Ping time-series of Spam blogs on a single day Ping time-series of Italian blogs over five days Ping time-series of Authentic blogs over five days Ping time-series of Spam blogs over five days 5 4 2. Characterizing the Splogosphere • Blogosphere dump for 21 days of July 2005 • 1.3 million total blogs • Blogs run through splog detector • Link distribution of blogs vs. splogs plotted on a log-log scale • Predictably only authentic blogs subscribe to a power-law (4, 5) Only in-degree distribution of authentic blogs subscribe to a power law Only out-degree distribution of authentic blogs subscribe to a power law Continuing Work • Inducing new features for splog detection • Language Independent and Adaptive Techniques for Splog Detection • Splog Taxonomy and Evaluation Metrics • Multi-Relational Local Models for Splog Detection • Tuning memeta to harvest blogs regularly Splog Detector Blog Identification Heuristics Language Identifiers Spam Blog Detectors IP Blacklists Authentic Blogs Spam Blogs Partially supported by NSF award ITR-IIS-0326460 and ITR-IDM-0219649 and IBM http://ebiquity.umbc.edu

More Related