400 likes | 517 Vues
A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case Study of Wikipedia’s History of the World. Kalev Leetaru, Eric Shook, and Shaowen Wang. CyberInfrastructure and Geospatial Information Laboratory (CIGI)
E N D
A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case Study of Wikipedia’s History of the World Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information Science School of Earth, Society, and Environment National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign CyberGIS ‘ 12, Urbana IL, August 8, 2012
Workflow Fulltext Geocoding Sentiment Mining CyberGIS
Inside the CyberGIS “black box” Open Service API Workflow Management Services GISolve Middleware Security Data & Viz Resource Selection Domain Decomposition Task Scheduling CI Clouds XSEDE OSG Emotional Heatmap
Data Input for a Topic A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic
Data Input for a Topic A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic ?
Spatializing Emotion 3 important elements 1. Importance of location 2. Prevalence of topic 3. Emotion toward topic Goal: Capture 3 elements on a single map
1) Importance of Location Every mention of a location increases its importance Generate a density map of the number of times a location is mentioned in text using Kernel Density Estimation (KDE) based on knearest neighbor search
2) Prevalence of Topic We term topic intensity to capture the prevalence of a topic relative to other topics, and adopt a method commonly used in epidemiological studies to estimate it Relative risk is a ratio of the KDE of disease infection locations and case control locations
Topic Intensity Topic Intensity KDE(articles that mention a topic)___ KDE(articles that do not mention the topic) Relative Risk KDE(points with disease)__ KDE(points without disease)
3) Emotion Toward a Topic Challenging question: Is the emotional measure tone, discrete or continuous? Is tone "countable" like trees or does it exist as a continuum like air temperature? Tone is a continuum: Cannot have "number of tones"
3) Emotion Toward a Topic A different method is used, because tone is continuous and not discrete Inverse distance weighted (IDW) interpolation is used to estimate tone across space creating a tone map Tone map captures positive and negative tone toward a particular topic across space
Overview – 3 layers Article density - Proxy: Importance of location Topic intensity - Proxy: Prevalence of topic relative to other topics Tone - Proxy: Emotion toward a topic
Overview – 3 layers Article density - Proxy: Importance of location Topic intensity - Proxy: Prevalence of topic relative to other topics Tone - Proxy: Emotion toward a topic First two layers represent scaling factors for tone Value range: 0 - 1 Value range: 0 - 100 Value range: -100 - 100
Emotional Heatmap Article Density * Topic Intensity * = Emotional Heatmap Tone
Summary First steps, but started the dialogue Balance Managing the complexity of cyberinfrastructure access Simplifying the workflow of chaining of spatial analytics Making sense of what’s involved Scientific rigor
Ongoing Work Translate spatial knowledge to domain knowledge by answering a basic question: why is this here and not there? Tackle spatial aggregation issues Represent locations as areas not points Areal interpolation
Acknowledgments • GuofengCao, AnandPadmanabhan • National Science Foundation • BCS-0846655 • OCI-1047916 • Open Science Grid • XSEDE SES070004N 39
Thanks! 40