1 / 21

DB2 Net Search Extender

DB2 Net Search Extender. Presenter: Sudeshna Banerji (CIS 595: Bioinformatics). Topics to discuss: Information retrieval Text-indexing DB2 Text Extenders DB2 Net Search Extender References Questions. A Little Background…. Information Retrieval(IR):

gene
Télécharger la présentation

DB2 Net Search Extender

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)

  2. Topics to discuss: • Information retrieval • Text-indexing • DB2 Text Extenders • DB2 Net Search Extender • References • Questions Sudeshna Banerji (CIS 595: Bioinformatics)

  3. A Little Background… • Information Retrieval(IR): • Extraction of “relevant” information from huge volumes of data scattered across different databases. • Examples: Textual search, image search, video search etc. • Efficiency(time and speed) of IR is based on different INDEXING technologies. • Indexing increases performance of system. • An example of indexing technology: Text-indexing used for textual-search. Sudeshna Banerji (CIS 595: Bioinformatics)

  4. A Little Background… • Text-Indexing : • Process of deciding what will be used to represent a given document. • A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. • The search is then handled as a query to look up the index. Sudeshna Banerji (CIS 595: Bioinformatics)

  5. A Little Background… • Text-Indexing (continued): • Involves the following: • Parsing the documents to recognize the structure. E.g title, date, other fields. • Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. • Stopword removal: based on short list of common words like “the”, “and”, “or”. Sudeshna Banerji (CIS 595: Bioinformatics)

  6. Indexing only Significant Terms Sudeshna Banerji (CIS 595: Bioinformatics)

  7. DB2 Extenders • Product of IBM family that provide support to data beyond traditional character and numeric data types. • Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. • Trial and beta versions available for testing. • Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html Sudeshna Banerji (CIS 595: Bioinformatics)

  8. DB2 Text Extenders • To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). • DB2 Net Search Extender • DB2 Text Information Extender • DB2 Text Extender • When to use what? • Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html Sudeshna Banerji (CIS 595: Bioinformatics)

  9. DB2 Net Search Extender • Replaces DB2 Text Information Extender Version 7.2 • Some important features: • Indexing speed of about 1GB per hour . • Different text formats: ASCII Plain text, HTML,XML, GPP • Base support for 37 languages including English, Spanish, French, Japanese and Chinese . • Sub-second search response times. • No decrease in search performance with up to 1000 concurrent queries per second. Sudeshna Banerji (CIS 595: Bioinformatics)

  10. DB2 Net Search Extender • Some text-search capabilities: • Search can be performed using SQL (fourth generation language…almost like English query). • Searches can include: • Boolean operations. • Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. • “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru • Thesaurus related search. • Restrict searching to sections within documents. • User can limit the search results with a “hit count”, and can also specify how the results are to be sorted. Sudeshna Banerji (CIS 595: Bioinformatics)

  11. DB2 Net Search Extender • System requirements • DB2 Version 8.1 • Java Runtime Environment (JRE) Version 1.3.1 • Windows Installation • Administrative rights required. • Call db2text start to start the DB2 Net Search Extender Instance Services. Sudeshna Banerji (CIS 595: Bioinformatics)

  12. DB2 Net Search Extender • Simple example with the SQL queries • Following steps are required to do a basic textual-search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index Sudeshna Banerji (CIS 595: Bioinformatics)

  13. DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: • To start Net Search Extender Service db2text "START“ • To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample" Sudeshna Banerji (CIS 595: Bioinformatics)

  14. DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample" Sudeshna Banerji (CIS 595: Bioinformatics)

  15. DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’ , ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“ Sudeshna Banerji (CIS 595: Bioinformatics)

  16. DB2 Net Search Extender 7. Searching with the text index: • Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice. • NOTE: • To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB. Sudeshna Banerji (CIS 595: Bioinformatics)

  17. DB2 Net Search Extender • Thesaurus Support: • A thesaurus is structured like a network of nodes linked together by relations: • Associative relations: RELATED_TO • Synonym relations: SYNONYM_OF • Hierarchical relations: LOWER_THAN, HIGHER_THAN • Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility. Sudeshna Banerji (CIS 595: Bioinformatics)

  18. DB2 Net Search Extender • Create a thesaurus definition file. • Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football .RELATED_TO goal .SYNONYM_OF soccer :WORDS chapel .LOWER_THAN skyscraper .HIGHER_THAN house Sudeshna Banerji (CIS 595: Bioinformatics)

  19. DB2 Net Search Extender • An example of a structure of a Thesaurus: Game HIGHER_THAN Ball Game HIGHER_THAN HIGHER_THAN Soccer HIGHER_THAN Tennis Football SYNONYM_OF Sudeshna Banerji (CIS 595: Bioinformatics)

  20. DB2 Net Search Extender • References: • http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC • Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ • Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software) Sudeshna Banerji (CIS 595: Bioinformatics)

  21. ANY QUESTIONS???? Sudeshna Banerji (CIS 595: Bioinformatics)

More Related