Entity Search and Matching with NECESSITY
100 likes | 201 Vues
Explore the NECESSITY framework for providing unique identifiers, entity requests, challenges, setup, and search processes using advanced matching modules. Utilizing algorithms and modules to deliver precise entity matches.
Entity Search and Matching with NECESSITY
E N D
Presentation Transcript
Entity Search with NECESSITY • 12th Workshop on Web and Databases (WebDB) • EkateriniIoannou, Saket Sathe, Nicolas Bonvin, Anshul Jain, SrikanthBondalapati, GlebSkobeltsyn, Claudia Niederee, ZoltanMiklos • L3S Hannover and EPFL Switzerland
Providing unique identifiers Okkamization Entities Webpages Documents (Information extraction) Query: name=“Einstein” physicist Entity Store Response: http://www.okkam.org/ens/idb3016709
Entities and Entity Requests • Entities are collection of attribute-value pairs with an okkam-id • Examples of entity requests • Q1 -- name= “Einstein” (AND) physicist • Q2 -- Einstein (AND) physicist • Q3 -- name= “Einstein” (AND) profession= “physicist” • name : Albert Einstein • affiliation : Institute of Advanced Study • profession : physicist • okkam-id : http://www.okkam.org/ens/id06b1791f
Identified Challenges Challenge: The number of entities could be huge • Store and retrieve using IR based techniques • Matching on very large datasets • “narrow” down the result-set to a more tractable matching candidates Challenge: A single algorithm for fine-grained entity matching may not exist • Use a range of matching modules • Matching using relationships and without schema information. • Explicitly defined by user/application
NECESSITY Setup • Approx. 1 Million entities extracted and indexed • People and organizations from Wikipedia • Locations from Geonames • Proteins form UniProt • Software Architecture • Lucene for handling inverted index • Solr for index distribution and load balancing • Hbase (Voldemort) for storing entity profiles
NECESSITY Search Process name=“Einstein" AND physicist Matching Modules Product Matching OKKAM Match API Module Selection: Entity Type Inferred from attributes Identified from receiver Required response time … Group Linkage Generic Matching Receive the entity request Convert request and select matching module
NECESSITY Search Process OKKAM Store Index name=“Einstein” AND physicist OKKAM Store API Top-k matches (IDs + scores) Top-k entities (candidates) • Each server processes the query from the index and returns top-k results • boost popular attributes • boost attributes specified by the query Aggregate top-k results from each server Send the query to index Query the distributed index Return top-k entities with scores
NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module … Receive matching candidates Advanced matching and final entities
NECESSITY Search Process • Background knowledge • Domain specific information • Analyze inner-relationships • Make another query • … name=“Einstein" AND physicist OKKAM Match API Matching Module X X X X X X X X X X X X X … Ranked list with matching entities 0.95 0.89