120 likes | 215 Vues
Explore extracting local understandings from user-generated reviews on city guide websites, including motivations, corpus analysis, processing challenges, nickname discovery techniques, ongoing experiments like attraction extraction and review classification, and future directions. The study delves into uncovering unique insights for locals and enhancing the exploration of popular urban spots.
E N D
Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006
Overview • Motivations • Corpus • Processing • Nickname discovery • Ongoing experiments • Attraction extraction • Review classification • Future work Andrea Moed | IS56 ANLP
Motivations • Local knowledge of well-known places… for locals • “Nobody goes there anymore, it’s too crowded” • Major draws (views, dishes, people…) • Best times/seasons/modes of transport? • Places to combine in one excursion • “A good place for X” vs. a Great Good Place* • *Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999 Andrea Moed | IS56 ANLP
Corpus • Yelp San Francisco • Social site organized around cities, launched 2004 • Thousands of SF places, reviews and reviewers • Largely local interest (Mass Media, Pets) • Some areas useful for visitors (Night Life, Shopping) • Writerly culture high structural and stylistic variation in the text • Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor • Destinations • Frequently reviewed places: 20+ reviews Andrea Moed | IS56 ANLP
Processing • Used Dappit to build page scrapers • Generated XML; parsed in Python • Place objects consisting of location info + reviews • Corpus collects place objects from various categories • Challenges of screen scraping • Tradeoff between more places and places with most reviews (optimization requires exhaustive search) • TripAdvisor proved too difficult • Analysis with Python and NLTK Lite Andrea Moed | IS56 ANLP
Place Nickname Discovery • Goal: Discover alternate search terms to surface more diverse local results in web search • Method: Regular expression matching Andrea Moed | IS56 ANLP
Place Nickname Discovery • Steps • Counted frequency of Yelp-given place name in reviews of that place • Tokenized name on whitespace • Rule-based generation of candidate nicknames: acronym, subsets of tokens • Compared frequencies of given name and each nickname • Potentially useful nicknames are those that occur at least half as often as the given name Andrea Moed | IS56 ANLP
Place Nickname Discovery • Results • From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each • 23 of 61 places appeared to have frequently used nicknames • BUT in 9 cases this was due to common words in names • First word most commonly used nickname in remaining cases • Hypothesis: Long tail of less predictable nicknames Andrea Moed | IS56 ANLP
Ongoing Work • Attraction extraction • TF/IDF calculation to find the concepts most widely associated with a place • Further text analysis to collect understandings of key concepts • Specificity • Sentiment • Temporality Andrea Moed | IS56 ANLP
Ongoing Work • Attraction extraction • TF/IDF calculation to find the concepts most widely associated with a place • Further text analysis to collect understandings around key concepts • Specificity • Sentiment • Temporality Andrea Moed | IS56 ANLP
Ongoing Work • Classification of reviews: recommendation vs. narrative • Recommendations help people “use” a city • Narrative is associated with memorable and unique locations • Features for classification • Verb tense distribution • Paragraph breaks • Opinion words at beginning and end (recommendation) • Memory and relationship words (narrative) Andrea Moed | IS56 ANLP
Future Work • Relating understanding about location features to external data (geocoding, weather) • Visualization of extracted concepts • Development of a training set for classification Andrea Moed | IS56 ANLP