440 likes | 580 Vues
R e a l - T i m e S e a r c h E n g i n e. Network software system laboratory. Rana Shahout & Ibrahim Baransi supervisor : Edward Bortnikov Winter 2011. Agenda. The problem & motivation Background in search systems The architecture CIP policies Software design. What?.
E N D
Real-TimeSearchEngine Network software system laboratory Rana Shahout & Ibrahim Baransi supervisor : Edward Bortnikov Winter 2011
Agenda • The problem & motivation • Background in search systems • The architecture • CIP policies • Software design
What? What is the project goal? Serving fresh search results when the data is constantly changing Nowadays websites changes in a high frequency, such as Twitter, Facebook, news .
Background in search systems Search caches Why is that a problem ? Search engine uses cache optimization which makes the search engine faster and efficient, when the data a dynamic data, some of cache’s information become irrelevant. Search engines search for the queries first in the cache, and only if there is cache miss they search in the Index. Thus, when the data is dynamic, it is existing in the cache, and the search engine returns UNCORRECT result
Data structures required for implementation Index- Lucene Index Directory : Lucene is a free text-indexing and -searching API written in Java, a typical Lucene index is stored in a single directory in the file system on a hard disk Cache- It was implemented as a linked-list with hash table. Replacement policy is LRU
CIP-- CACHE INVALIDATION PREDICTORS The CIP is formed of two major parts: Synopsis generator is responsible for preparing synopses of the new documents coming in . Invalidator interacts with the runtime system and decides which cached entries to invalidate according to two policies.
Invalidation Policies • Basic: invalidates each query (in the cache) which appear in the synopsis. • Score: • Find out all the queries (in the cache) which are contained in the synopsis, for each one of them compute score(q,d)- where d is the added/updated document – and invalidate top K results.
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache CIP Will help here ! Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache My work is done Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Basic Invalidation Cache Added Document • President Barak Obama meets Mubarak in London
Score Invalidation- K=1 Cache Added Document • President Barak Obama meets Mubarak in London
Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London
Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London
Score Invalidation- K=1 Cache Added Document-d • President Barak Obama meets Mubarak in London
Software Design – UML Diagrams Search Query, with miss in cache
Software Design – UML Diagrams Add a document to index with basic invalidation
Skills We acquired the following skills in this project: • Knowledge: reading scientific publications • Java (& Advanced Java topics) • Working with Web-server.(apache) • Learning Lucene features and how to use it. • Building software Cache. • UML • XML parsing • HTML