Full-Text Search: Efficient and Powerful Solution for Querying Substrings

CS122B: Projects in Databases and Web Applications Winter 2019 Notes 11: Full-Text Search Professor Chen Li Department of Computer Science UC Irvine

Search using “LIKE” • SELECT * FROM movies where title LIKE “%rain%” • Limitations: • “rain” will match “how to train your dragon” • “Chance Meatballs” cannot find “Cloudy with a Chance of Meatballs 2” • Reasons? String is treated a substring not a “word” or “a bag of words”

Solution: full-text search • Most DBs support native full-text search • Example: CREATE TABLE ft ( entryID INT AUTO_INCREMENT, entry text, PRIMARY KEY (entryID), FULLTEXT (entry)); SELECT entryID FROM ft WHERE MATCH (entry) AGAINST ('golf');

Under the hood: Inverted index 2 4 8 16 32 64 128 1 2 3 5 8 13 21 34 Cat Dog Bird 13 16

DB full-text search: good and bad • Good: • “Native” – search happens at the place of the data, no separate copy, and no effort to do sync • Bad: • very limited support (e.g., ranking, stemming, tokenization, stop words, multi-language support) and efficiency

Alternative solutions • Use a separate middleware solution just for search • Lucene, Solr, and Elastic • Bad: • Second data copy; needs to sync with DB data • Good: • Very powerful

Full-Text Search: Efficient and Powerful Solution for Querying Substrings