160 likes | 320 Vues
Image searching on the Web. Qunyan Mao SIMS, UC Berkeley. Image on the Web. More images on the web Format of image GIF: CompuServe Graphics Interchange format, a dominant web format JPEG: support “true color”, ideal for photographs TIF: large file size, not well suited for web pages.
E N D
Image searching on the Web Qunyan Mao SIMS, UC Berkeley
Image on the Web • More images on the web • Format of image • GIF: CompuServe Graphics Interchange format, a dominant web format • JPEG: support “true color”, ideal for photographs • TIF: large file size, not well suited for web pages
Image format • BMP: such as desktop wallpaper image, not supported by any Web browser • PNG: Portable Network Graphics. An image format for the future.
Image indexing • Two methods used in indexing images • text-based • content-based
Text-based vs content based • Text-based : commonly used image indexing and searching method • descriptive text • controlled vocabulary • Drawbacks: • have to have descriptor along with the image to make it accessible • consistency: different type of textual data • human intervention
Text-based vs.. content based • Content-based indexing and searching • goal: provide algorithms that can automatically recognize the important features in an image without human intervention • search by color, shape,spatial relationship
Where to start • Search engine: • general search engine: AltaVista, Lycos, • image search engine: WebSEEK • Specialized image database: • Museum, archive, and library digital image database
How search engine work • How does a Web search engine identify images and match to your criteria • look for graphic files: HTML tags: <img src> and <href>. Example: AltaVista • Look for caption: HTML tag<alt> • Look for title of Web page • Employ human intervention to catalog images
Search example: Giraffe • AltaVista • Lycos • WebSEEK
AltaVista • Http://image.altavista.com/cgi-bin/avncgi • Text-based indexing: file name and path name • (live search demo)
Lycos • Http://lycospro.lycos.com • Text-based indexing: file name, path name and caption(<ALT> tag) • (live search demo)
WebSEEK: • Http://www.disney.ctr.columbia.edu/webseek • content-based image search engine • (live search demo)
Specialized image database • Museums, archive, and library digital image database • Fine arts museums of San Francisco http://www.thinker.org/imagebase/index-2.html • California heritage collection http://sunsite.berkeley.edu/CalHeritage/collection.html • National Museum of America Art
Problems in using search engines • Search result heavily relies on the how the webmaster name the image file and directory. • Even with the content -based search, there are a lot unexpected results
Specialized image database • More likely use controlled vocabulary to describe the image • well organized comparing to the other images on the Web • have their own search tools or find aids
Before using the image • Available for viewing doesn’t mean available for reuse • Make sure you have right to use it