1 / 24

Mapping the Blogosphere in America

Mapping the Blogosphere in America. CS406 Assignment – Group Presentation. Brian McGee Craig Murray Piers Thorogood Emlyn Whittick. Agenda. Summary of the paper Paper’s key focuses Geolocation of blogs Indexing blogs to city units Related Work Geolocation in general

ike
Télécharger la présentation

Mapping the Blogosphere in America

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping the Blogosphere in America CS406 Assignment – Group Presentation Brian McGee Craig Murray Piers Thorogood Emlyn Whittick

  2. Agenda • Summary of the paper • Paper’s key focuses • Geolocation of blogs • Indexing blogs to city units • Related Work • Geolocation in general • Alternative mapping of the blogosphere • Conclusion • Questions

  3. Summary of the Paper “Mapping the Blogosphere in America” • Presented at the WWW2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics • Dr. Alexander Halavais & Dr. Jia Lin • University of Buffalo School of Informatics, NY

  4. Summary of the Paper • Initial phase of a long-term project • Long- term goals: • Examination of American urban culture • Based on information found in personal blogs • Observe localised political agenda and opinion • Short- term goals: • Extracting geographic information from blogs • Indexing blogs to ‘city units’

  5. Geolocation of Blogs • No single method to calculate the location of a blog... • Self- hosted blogs (dedicated domain name): • Registrant’s address found in domain registry • Hosted using blog- hosting service: • Location perhaps included in user-profile • Blog perhaps registered with regional blog-hosting service e.g. ‘NYCblogger.com’

  6. Geolocation of Blogs • What if there is no explicit location information? • Answer: Data Mining... • Links to a CV or biography containing location information • Location found from links to local weather, school, church or other communities

  7. Geolocation of Blogs • Manual pilot run on 1500 US blogs • 60% successful identification for self-hosted blogs • 30% for blogs on blog-hosting sites • Working on an automatic algorithm • Current approach... • GeoURL Metadata, if available • Whois query for unrecognised domains • Profile information, if available • Blogchalking, if available • Text on index page (Bio / resume / regionalised links)

  8. Indexing Blogs to City Units • How do we standardise geolocation data? • Varying levels of detail... • Self- hosted blog: Precise • Street address • 9- digit zip code • Blog-hosting site: Can be vague • City, state, or even nation • Local links can provide telephone area codes

  9. Indexing Blogs to City Units • How to convert this to a standard unit? • Labelling of by city is vague • Expansion of city limits • Emergence of ‘second cities’ between big cities • Requirement for an urban unit • “Geographic clusters consisting of certain sizes of population sharing physical proximity” [1]

  10. Indexing Blogs to City Units • The 3- digit zip code • Widely used in marketing and political strategies • Represents 4 different types of area: • Metropolitan city • Cluster of suburban cities and towns • Cluster of cities not immediately adjacent to a metropolitan area • Metropolitan cities plus embedded cities and towns

  11. Indexing Blogs to City Units • Preliminary examination of blog distribution in the US • Users taken from Livejournal and Diaryland • Both services include location in user profile • 797 different 3- digit zip codes found • Overall distribution consistent with population distribution and concentrations of high socio-economic status

  12. Indexing Blogs to City Units Figure 1. Distribution of blogs in sample [4]

  13. Limitations of the Paper • Authenticity of quoted geographic information is questionable • 3- digit zip codes • Overstate the number of bloggers in metropolitan cities • Many small cities can be grouped into one unit, despite no evidence of common traits or social cohesion. • Paper suggests dividing units by socio-economic profile

  14. Related Work • Geolocation in general • Non- Geographical Mapping of the Blogosphere: • Hyperlink Maps • Kohonen Self- Organising Maps

  15. Geolocation • Geographic Information Systems (GIS) • Geoparsing • Geocoding • Methods • “Whois” records • Blogging sites requiring registration • Postal addresses and telephone numbers • Geographic feature names • Hyperlinks • Meta data

  16. Geolocation • Uses • Information retrieval based on geographic criteria • Tailoring of advertising • Sociological and political trends, mapping the ‘buzz’ of a topic can see which areas are most interested in it • Problems • Increasing number of mobile devices

  17. Geolocation • Trends • Blogging hotspots • More widespread blogging in Eastern US Figure 2. Blogging Hotspots [6]

  18. Geolocation • Trends • Analysing blogs by geography shows where interest lies. • You can see a correlation between blogs and restaurant locations Figure 3. Steak n Shake Restaurants [6]

  19. Mapping Blogosphere • Other methods of mapping the blogosphere: • Mapping hyperlinks • Self- organising maps • Mapping communities

  20. Mapping Hyperlinks Figure 6. Inbound links [1] • Cybermap showing outbound and inbound links from www.littlegreenfootballs.com in 3D hyperbolic space Figure 5. Outbound links [1]

  21. Self- Organising Maps • Neural Network like Kohonen SOM’s can be used to map blogosphere • Advantages • Performs clustering of input data • Maps this onto 2D surface for easy visualisation Figure 7. Kohonen Map of Blogs [7]

  22. Mapping Communities • Location, friendships and communities are all interrelated Figure 4. The importance of location interest and age in forming blogging communities [5]

  23. Conclusion • Summary of the Paper • Geolocation • Methods • Uses • Trends • Alternative Mapping Methods • Hyperlink Mapping • Self- Organising Maps

  24. References [1] R. Ackland, “Mapping the U.S. Political Blogosphere: Are Conservative Bloggers More Prominent?”, 2005. [2]O. Buyukokkten et al., “Exploiting geographical location information of Webpages” In Proceedings of WebDB-99, the 1999 ACM SIGMOD Workshop on the Weband Databases, 1999. [3] B. Gueye et al., “Contraint-Based Geolocation of Internet Hosts,” In Proceedings of IMC ’04, Sicily, 2004. [4] A. Halavais and J. Lin, “Mapping the Blogosphere in America,” In Proceedings of the Thirteenth International World Wide Web Conference (WWW2004), New York, 2004. [5] R.Kumar et al., “Structure and Evolution of Blogspace,” In Communications of the ACM, 47:12 pp.35- 39, 2004. [6] L. Lloyd, P. Kaulgud, and S. Skiena, “Newspapers vs. blogs: Who gets the scoop?” In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs(AAAI- CAAW), California, 2006. [7] J. Merelo-Guervos et al., “Mapping weblog communities,” Depto. Arquitectura y Technologia de Computadores, Universidad de Granada, 2006.

More Related