130 likes | 252 Vues
This project outlines the design and implementation of an enterprise archive and search system utilizing LogicSQL. Discover methodologies for organizing enterprise information and making it accessible through an effective search engine built on a database management system. Key challenges include implementing an inverted index and executing TOP-K queries for ranking results based on relevance. Furthermore, the system extends database functionalities with custom ranking algorithms, security models, and extensive user configuration options, ultimately enhancing information retrieval across internal systems.
E N D
LogicSQL-based Enterprise Archive and Search System Li-Yan Yuan How to organize the information and make it accessible and useful ?
Projects • How to develop an enterprise search engine based on a database management system challenges: implementation of the inverted index
Projects • How to implement the TOP K query • Ranking formula • Inverted indexes are created with respect to frequences
Internet search • Search for relevant web pages • Good answers: • Relevant • Popular • Public domain knowledge, • Search engines are critical to Internet use • internal workings are secret • Tremendous political, economical, and cultural power
Enterprise search • Search the enterprise information systems for right information • Enterprise information • Internal web pages • Internal documentation systems • File systems • Databases • Email servers • The internet and enterprise domains differ fundamentally • Contents • User behavior • Economic motivations
Top-K Query • Objective • How to determine the top K objects that are most likely (approximately) related to the given query • Applications • Information retrieval • Internet and enterprise searches • Multimedia similarity search • Scheduling large scale on-demand data broadcase • ……
LogicSQL Enterprise information Archive and Search system • LogicSQL An object-relational database management system • New concurrency control algorithm • Staged database architecture • Developed in the University of Alberta • Commercialized by Shanghai Shifang Software Co.
Enterprise Archive and Search System • To archive all the enterprise information contents • File systems • Web pages • Emails • Internal documents • Database records? • To provide a web styled search engine • To support user-specified ranking algorithms • focus on the platform of archive and search • Easy implementation and test of various ranking algorithms
Enterprise Archive and Search System • Extend the database functionalities • Security model • Users, roles + security handle • Security primary key • New database objects • Inverted indexes • CREATE INVERTED INDEX • DROP INVESTED INDEX • Automatic population, similar to that of index • ORDER BY clause • User specified aggregate functions • CREATE AGGREGATE FUNCTION • Top-K query evaluation • Specified crawlers
Enterprise Archive and Search System • User configuration • Set up crawlers • Create a list of inverted indexes • Create one aggregate function for object ranking • Extend the query languages • Implement the top K query algorithm • Web based query pages