C hrono S earch

Pete Bohman Adam Kunk ChronoSearch

ChronoSearch • ChronoSearch: A System for Extracting a Chronological Timeline Chrono

Motivation • Current search engines do not provide a complete picture • Latest events dominate top results • The user is forced to parse through lots of pages to find a complete list of information • ChronoSearch aims to summarize search results into a concise list of important events related to an entity

Problem Definition • Input: An entity E (most likely a person) • Output: A sorted list of events, L, which are related to E L = { li| li is unique and li occurred before li+1}

Problem Statement • Tuple extraction: (Event, Entity, Date) • Difficulties of Extraction • Dates • No standard format, relative dates • Events • Hard due to random input, unstructured data • Entity • Pronouns (“He” / “She”) • Entity Event Association

Our Approach • Baseline Approach – Web Redundancy • Date extraction based on absolute dates • Entity extraction by literal entity • Association based on sentence boundary • Event is implicitly described by the sentence itself • We consider sentences containing the entity being searched as well as an absolute time

Our Approach • Baseline Approach • Leverages Web Redundancy

Initial Results • Demo time…

Results Analysis • Information Retrieval (IR) performance characteristics: • Precision – fraction of documents retrieved that are relevant to query • Recall – fraction of documents that are relevant to query that are successfully retrieved

Ultimate Approach • Improving precision: • (Part 1) Eliminating duplicates • (Part 2) Eliminating unimportant results

Eliminating Duplicates • Improving precision: • (Part 1) Eliminating duplicates • Cosine similarity duplicate detection • The probability that s and s’ are the same event: • P(s' reports the same event as s) = cosine( s ' ,s ) • Term frequency vectors: s and s ’

Eliminating Unimportant Results • Improving precision: • (Part 2) Eliminating unimportant results • Important results occur more frequent • Utilize term frequency to eliminate unimportant events • Option 1: Term frequency calculations based on results returned from initial search query • Results that do not occur frequently in the returned corpus will be eliminated • Option 2: Leverage Google search

Eliminating Unimportant Results Cont. • Eliminate results outside of “-x” standard deviations based on search results returned for the given result

C hrono S earch

C hrono S earch

Presentation Transcript

C areer S earch I nvestigation: Building a strong case during your college experience

Games and adversarial s earch

S earch

Games and adversarial s earch

G eneral i zed S earch T rees

C ryogenic D ark M atter S earch Soudan Underground Laboratory

S EARCH T OOLS

S EARCH E NGINE O PTIMIZATION

JEDI A S earch for C harged-particle EDMs with S torage R ings

CRESST C ryogenic R are E vent S earch with S uperconducting T hermometers

S u c c e s s

Product Recycling S earch

Re s earch and innovation project

O ptimal B inary S earch T ree

Advanced D epth - F irst S earch and B readth- F irst S earch

Heuristic S earch: A*