1 / 43

Lecture # 32 WWW Search

Lecture # 32 WWW Search. Review: Data Organization. Kinds of things to organize Menu items Text Images Sound Videos Records (I.e. a person ’ s name, address, & phone number, or a car ’ s year, make, & model). Review: Data Organization. Three ways to find things:

derron
Télécharger la présentation

Lecture # 32 WWW Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #32 WWW Search

  2. Review: Data Organization • Kinds of things to organize • Menu items • Text • Images • Sound • Videos • Records (I.e. a person’s name, address, & phone number, or a car’s year, make, & model)

  3. Review: Data Organization • Three ways to find things: • Lists (in-order search, binary search) • Trees (balance number of branches with time to decide which is correct branch) • Search

  4. WWW Search

  5. Search issues • How do we say what we want? • I want a story about pigs • I want a picture of a rooster • How many televisions were sold in Vietnam during 2000? • Find a movie like this one • How does the computer find what we said?

  6. Things to search for • Records • Text • Images • Audio • Video

  7. Records • Car • Price • Miles • Year • Make • Doors • Queries • Price < 6000 & Miles<100000 • Make == Toyota & Year > 1993

  8. Queries • Make == Toyota & Year >1993

  9. Queries • Make == Toyota & Year >1993

  10. Queries • Year >1993 or Price < $3,000

  11. Queries • Year >1993 or Price < $3,000

  12. Databases • Large collections of records • Accessed by queries

  13. Things to search for • Records • Text • Images • Audio • Video

  14. Text searching • How do I say what I want? • Type some phrase • I want a story about pigs • How will the computer match this? • What is text? • An array of characters • What can can a computer do with text? • Match characters

  15. Text searching • People think in words not characters • How do I convert an array of characters into an array of words? • Collect together sequences of letters • How do I know if character C is a letter? • C>=“a” & C<=“z” | C>=“A” & C<=“Z”

  16. Convert to words • Because people think in words

  17. Every document is an array of words • I want a story about pigs • How will I find the right documents? • Find all documents that have the word “pigs”

  18. Searching text • How will I find pigs fast? • Create an index of all words • With each word store the name or address of each document that contains that word • Search the index for “pigs” • Return the list of documents • Use a binary search on the word list (50,000 words)

  19. Problems • What if a document has the word “Pig” but not “pigs”? • Normalize • Case - make all words lower case • Pig -> pig • Stemming - remove all suffixes and prefixes before putting a word into the index • pigs -> pig • piggy -> pig

  20. Problems • I want a story about pigs? • How does the computer know to search for pigs? • It doesn’t • How does the computer know what a story is? • It doesn’t

  21. Searching • I want a story about pigs • Pick out the important words and search for them • Which words are important? • D = number of times a word appears in a document • A = average number of times a word appears in all documents • Importance = D/A • Why?

  22. How do we create an index of all documents on the Web? • Try = a list of URLs • Seen = all URLs you have seen While (Try is not empty) { Page = take a URL from Try Words = all the “important” words in Page add Page to the index using all of Words Links = all URLs in Page for every Link that is not in Seen add Link to Try and to Seen }

  23. Other ways to find important words and important documents • A Document is important if many other documents point to it • A word is important in document D if that word occurs frequently in documents that link to document D.

  24. Images • What will I say when searching for an image? • I want a rooster picture • Draw a picture of a rooster?

  25. Search by picture? Is this possible? If so, how? ?

  26. What’s in a picture? • Computers don’t understand the contents of images • To a computer an image is a bunch of colored pixels

  27. I want a picture of a rooster • Label all of the pictures • How does Google Images do it? • File name of the picture “rooster-crossingSt.jpg” • Words around the picture in the HTML • Use “Safe Search” and set filters appropriately (http://www.youtube.com/watch?v=maWx-ApkBCs)

  28. Audio • Talking • Use speech recognition to convert audio to text • With each recognized word keep track of where in the audio it was recognized. • Build an index using the recognized text • Normalize based on how words sound rather than are spelled.

  29. Video • Where in “Casablanca” does Bogart say “Play it again Sam” ? • he never does, he just says “play it” • How can the computer find that? • Transcribe the audio • Speech recognition on the audio

  30. Video • Does Woody ever kiss Bo Peep? • Exactly what color is a kiss?

  31. Video • Does Woody ever kiss Bo Peep? • Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.

  32. So what’s with this?

  33. Or this?

  34. Is Woody cheating?

  35. Search • Records • Queries • < > = And Or • Text • Normalized words (case, stemming, thesaurus) • Images • Add words • Audio • Transcribe or recognize as words • Video • Transcribe • Annotate

  36. “Re-Search” Directions in Image Recognition, Search and Retrieval

  37. Face DetectionIn Commercial Digital Cameras • Train on • 1000’s of faces • Millions of non-faces Face Detection – Viola & Jones

  38. Face Recognition(Eigenfaces [Turk and Pentland 1991]) Project image into higher-dimensional space 2 N N 0 71 250 68 210 44 128 53 N “Recognize” by grouping unknown image with closest training example

  39. Face Recognition(Picasa - Google) • Image search/organization • Automatically finds, crops and groups images of the same person from a collection of photos • Allows user feedback (trainable) - user can indicate if it found the wrong person.

  40. Bag of “words”* Face/Object Recognition/Search:Feature-Based Technology Extract Features Object *Li Fei-Fei (Princeton) Create visual“words” from image features.

  41. Face/Object Recognition/Search:Feature-Based Technology *Li Fei-Fei (Princeton) Do this for multiple objects

  42. Face/Object Recognition/Search:Bag of Words How to get matching images/documents?: Use “word” frequencies = where nid = # times word i occurs in document d nd = total # words in document d Then combine word frequency with inverse document frequency weighting to downweight words that occur frequently (D = # of occurrences; A = average # of occurrences)

  43. Face/Object Recognition/Search:Feature-Based Technology *Li Fei-Fei (Princeton) Drop word features through a “vocabulary tree” to classify

More Related