Audio Information Retrieval and Audio Search By Chris Mc Coy
Brief Overview • The basics of audio information retrieval and audio search • What are some audio IR mechanisms? • Briefly: Compression • Searching by audio query • How it works • The future of audio IR and audio search
The Basics of Audio Search and Audio Information Retrieval (cont.) • What is it? • Audio information retrieval is the process of retrieving audio information from the available resources • Text based searching for audio information is most common • Audio search (content-based retrieval) is a method of retrieving information by using a piece of audio information (ex: a melody of a song)
Why is Audio IR important? • Audio information in the form of music is one of the dominant forms of entertainment • News reporting • Comedy audio segments • Online radio and sports broadcasts • Audio in presentations leads to a more interesting and interactive product • Research and homework
What are some Audio IR Mechanisms? • Text based • Peer-to-peer file sharing software • FTP • Streaming audio • Websites • Online network drives • Clip Art • Audio search devices and software
Peer-to-Peer File Sharing Software • Peer-to-peer software connects one computer to another directly without a central point of management • Information can be queried and results transferred from one computer to another computer
Peer-to-Peer File Sharing Software (cont.) • Some common software examples • Bearshare • KaZaA • WinMX • Others: eDonkey2000, Furi, Blubster, Grokster, Madster, etc.
Free download version and $19.95 pay version (6 months) with extra features • Retrieves the following audio information • Movie and television sound clips • Music songs (.mp3 most common form) • Historic news reports • Famous speeches • Other types of media: • Videos, pictures, text documents, etc.
Peer-to-Peer File Sharing Software (cont.) • Problems • Spyware • Programs that collect information about the user and usage of the computer • Virus transmittal • Trojan horse • Ex: “Rolling Stones – Ruby Tuesday.exe” • Too many unrelated results returned • Query for the music band “Love” • Thousands of songs with “love” in the title are returned • Mismatched names and identification info
Compression • Audio Compression is a method used to decrease the size of an audio file to conserve disk space • MP3 is one of the most common forms of compressed audio • MP3 is a lossy format • Depending on quality: 1/5 or 1/10 size of .wav • Lossy format – tradeoff: soundquality and filesize • Other common types of compressed audio: • Real Audio (lossy format) • .SHN (Shorten) (loseless format) • Controversy exists over the sharing of audio information in compressed format
FTP • Common storage area for audio information • Many FTP cater to particular areas of audio • FTP sites for trading rare music from a particular band • Old archived radio or historic audio files • To find some FTP sites do a search on a search engine (ex: “Beatles FTP”)
FTP (cont.) • Information needed • IP (ex: 184.108.40.206) • Port (ex: 21) • Login (ex: music) • Password (ex: mp3)
Streaming Audio • Used on many commercial websites to provide sound samples of music • Ex: www.cdnow.com • Allows for quick audio information retrieval • No permanent download needed • Used for many news and radio broadcasts • Ex: www.nfl.com (live radio broadcasts) • Real Audio and Windows Media Player are the most common players
Websites • Some websites have audio information available for download free of cost • www.mp3.com • Usually stored on their own personal storage space
Online network drives • Good for sharing audio information between people on the same network • Students on a residence hall network • Employees at work on the same network • Positives • Fast transfer between computers • Negatives • No transfers if network is down • Virus transmittal
Clip Art • Good for finding audio information for presentations or laughs • Power Point has a built in sound clip organizer which you can query by text
Audio Search:Content-Based Retrieval • New developments and technologies allow querying IR mechanisms by audio • New audio mining (aka audio indexing) tools allow both speech processing and search technology all in one package • Data can be time stamped and queried later by speech or by text • Good for referencing logged business calls
Audio Search: Content-Based Retrieval (cont.) • New device by Philips Electronics in the Netherlands (hope to hit consumer market 2004) • Microphone device captures your voice • “Audio fingerprints” are determined • Melody query then is sent to a database • Results are returned • Good for finding a song you don’t know by name but know by tune
Audio Search: Content-Based Retrieval (cont.) • Attributes of an audio signal used to index • Amplitude - the maximum amount of displacement of a particle on the medium from its rest position • Frequency - how often the particles of the medium vibrate when a wave passes through the medium
Audio Search: Content-Based Retrieval (cont.) • Other attributes used to index audio information • Average energy: loudness of audio signals • Bandwidth: frequency range of a sound • Brightness: Midpoint of the energy distribution of a sound • Harmony: In harmonic sound the spectral components are mostly whole number multiples of the lowest, and most often, the loudest frequency. The lowest frequency is called fundamental frequency • Pitch: how high a sound is; use fundamental frequency as an approximation
Audio Search: Content-Based Retrieval (cont.) • Positives • Quick and easy searching of databases • Less problems with text labeling • Negatives • Sometimes there are difficulties with speech and sound recognition
The Future of Audio IR • Unanswered Questions • Will faster computers = faster audio IR and search mechanisms? • What direction will the new audio IR systems head towards? Content-based retrieval or text based retrieval? • How will new file storage mediums and new compression methods affect audio IR? • What will the impact of querying by audio be once the software hits the commercial market?
References • After Napster: The Beat Goes On. Retrieved November 25th from http://www.afternapster.com/# • Anonymous (2002, November). Name that tune. Technology Review, Cambridge. Volume 105, issue 9, page 18. • Anonymous (2002). The Phsyics Classroom. Retrieved December 4th from http://www.physicsclassroom.com/Class/waves/U10L2b.html • Data Compression. Stanford University. Retrieved November 24th from http://www.stanford.edu/~udara/SOCO/lossy/mp3/ • Gerard, Mike (2002). Security Risks of Peer-to-Peer Software across the Internet. Retrieved November 23rd from http://ref.cern.ch/CERN/CNL/2002/001/security/ • Mitchell, Robert L (2002, August). Search engines break the sound barrier. Computerworld. Volume 36, issue 32, page 34. • Napster by all the top cartoonists. Retrieved November 17th from http://cagle.slate.msn.com/news/napster/main.asp • Shankland, Stephen (2001, April). Sun to show peer-to-peer software. CNET News.com. Retrieved November 25th fromhttp://news.com.com/2110-1017-256270.html?legacy=cnet • SHN FAQ. Retrieved November 24th from http://alumni.umbc.edu/~hamilton/shnfaq.html • Wang, Wanshuang. Indexing and Retrieval of Multimedia Data. Retrieved December 4th from http://php.iupui.edu/~wwang/slides.html