PPT - Stanford Events Crawler PowerPoint Presentation, free download

Stanford Events Crawler Zoe Chu Michael Tung

Architectural Overview Crawler Event? classifier http web nntp newsgroup pop Extractor mailing list backend frontend Event tuple DBMS Presentation layer User Applications Notification

Crawling/Classification • Event pages • Index and detail pages • 10 events/sec

Extraction/Normalization • LR Wrappers • Segmentation for email – decision tree classifier • Hand written rules for field extraction • Date & Time Normalization • Building Normalization • Edit distance against a lexicon of Stanford building names • Free text search - Lucene

Which fields? • Title • Date • Time • Location • Category • Sponsor • Contact info • Admission/fees • Speaker • Food • Description • Building • X,Y physical coordinates • Picture of building • Map • Nearby buildings

Stanford Events Crawler

Presentation Transcript

Web Crawler &amp; Distributed IR

Stanford

JSI News Crawler

Why the Wall Crawler?

Web crawler

Web Crawler

Gnutella Crawler

IRMIS PV Crawler

STANFORD

Crawler policy document

Stanford

Web Crawler Agent (WCA)

Focused Crawler

Smart Crawler A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces

IRMIS PV Crawler

Crawler Excavator Market

Crawler Tractor Market

Crawler Camera System Market

Crawler manuals

Stanford Events Crawler

Presentation Transcript

Web Crawler &amp;amp; Distributed IR

Stanford

JSI News Crawler

Why the Wall Crawler?

Web crawler

Web Crawler

Gnutella Crawler

IRMIS PV Crawler

STANFORD

Crawler policy document

Stanford

Web Crawler Agent (WCA)

Focused Crawler

Smart Crawler A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces

IRMIS PV Crawler

Crawler Excavator Market

Crawler Tractor Market

Crawler Camera System Market

Crawler manuals

Web Crawler & Distributed IR