1 / 42

Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

by SAMEER ABROL. Distributed Content Based Visual Information Retrieval System On Peer To Peer Network. Source ACM Transactions on Information Systems (TOIS) Volume 22 ,  Issue 3  (July 2004) Pages: 477 - 501   Year of Publication: 2004 ISSN:1046-8188

nathanburns
Télécharger la présentation

Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. by SAMEER ABROL Distributed Content Based Visual Information Retrieval System On Peer To Peer Network Source ACM Transactions on Information Systems (TOIS) Volume 22 ,  Issue 3  (July 2004) Pages: 477 - 501   Year of Publication: 2004 ISSN:1046-8188 AuthorsIrwin King  The Chinese University of Hong Kong, Shatin, Hong Kong Cheuk Hang Ng  The Chinese University of Hong Kong, Shatin, Hong Kong Ka Cheung Sia  The Chinese University of Hong Kong, Shatin, Hong Kong Publisher ACM Press   New York, NY, USA

  2. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  3. Introduction • Peer-To-Peer Applications (e.g. Gnutella) have demonstrated the significance of distributed information sharing systems. • Peer-To-Peer Network offers a completely decentralized and distributed paradigm. • Currently, most content-based image retrieval (CBIR) systems are based on the centralized computing model. • P2P offer advantages of decentralization by distributing : • Storage • Information • Computation Cost  • Because of these desirable qualities,  many research projects have been focused on: • Designing different P2P system • Improving their performance Content Based Information Retrieval

  4. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  5. Peer-To-Peer Networks • Unlike client-server architecture Individual computers connect directly with each other without using dedicated servers • Each computer acts as a server and a client simultaneously • Computers leave and join the network frequently • Emerging P2P networks offer the following Advantages: • Distributed Resource – The storage, information and computational cost can be distributed among the peers • Increased Reliability – No reliance on centralized coordinators • Comprehensiveness of Information – The P2P network has the potential of reaching every computer on the Internet Content Based Information Retrieval

  6. Peer-TO-Peer Networks…Flooding Broadcast of Queries Plainly, this model is wasteful because peers are forced to handle irrelevant queries… • Different files are shared by different Peers • Broadcasts a query request to its connecting peers • Share Information directly with each other (unlike Client-Server Architecture) • Messages sent over multiple hops • Each Peer looks up its own local shared collection and responds to queries FIG 1 Illustration of Information retrieval in P2P Content Based Information Retrieval

  7. Peer-To-Peer Networks…Other Discovery Methods • Distributed Hash Tables: • Technique to map Filename to a Key • Each peer stores a certain range of (Key, Value) pairs • Some of the examples are CHORD (Key as a m-bit integer ) and CAN (key as a point on d-dimensional Cartesian coordinate space) models • DHT’s mandate a specific network structure and incur a certain penalty on joining and leaving the network • Their performance under the dynamic conditions of prevalent P2P systems is unknown • Routing indices approach • Each peer maintains a Routing index • This method requires all peers to agree upon a set of document categories These Methods are still under research for Content Based Information Retrieval… Content Based Information Retrieval

  8. Features are extracted from the images in the database which are stored and indexed (done off-line). Query example image from which image features are extracted These image features are used to find images in the database which are most similar Candidate list of similar images are shown to the user From the user feed-back query is optimized and used as a new query in an iterative manner Content Based Information Retrieval The Goal of CBIR systems is to operate on collection of images and, in response to visual queries, extract relevant image. Content Based Information Retrieval

  9. Content-based information retrieval…Basic Concept Fig 2 Content Based Information Retrieval

  10. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  11. Contributions • Efficient Data Lookup • Organize information in P2P network • Route the query intelligently according to the content of query • Unlike CAN or Chord, allows peer to index its own collection, no fix topology, no fix data placement • Rich Query • User can perform more complex query based on the content of information rather than simple text • Algorithm is implemented on DISCOVIR system (covered in the last section) Content Based Information Retrieval

  12. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  13. Peer Clustering • Cluster Peers with similar image features together • Makes use of an extra layer of connections, called attractive links, on top of the original P2P network • Each peer shares a set of images and is responsible for extracting the content-based feature of its shared images • This collection of feature vectors is used to describe the characteristic of the shared images • Similarity between two peers is determined using the set of feature vectors as signature value of a peer Network is organized in a systematic way like the Yellow Pages in order to improve Query efficiency Content Based Information Retrieval

  14. Peer Clustering…Summary Of Key Terms Table I Content Based Information Retrieval

  15. Peer Clustering…Important Definitions • Collection (p) = Represents set of n images a peer p shares • Low level feature extraction is performed on each image to map it to a multi-dimensional vector by function f Where f is a specific feature extraction function, R is the Real-valued d dimensional vector • After extraction, each peer contains a set of feature vectors Where, Content Based Information Retrieval

  16. Peer Clustering…Important Definitions • Cat (p) = Where and are the mean and variance of the image vectors collection that peer shares • Sim (p, q) • Distance measure between two peers’ signature values, Cat (p) and Cat (q) Content Based Information Retrieval

  17. Peer Clustering…Explanation • The feature vectors of its shared images form a sub-cluster in the high dimensional space • Thus, Mean and Variance can describe the characteristics of this collection • The target is to group these sub-clusters to form a cluster of peers that shares similar images • The more similar two peers p and q are, the smaller the value of Sim(p, q) • Sim(p, q) measure is small when are close and both are small • If both variances measures are small, it means the shared images are highly related to common topic • When the means are close, it means that the two sub-clusters are close in high dimension space We Assume each peer often shares images relates to a certain topic… Content Based Information Retrieval

  18. Peer Clustering…Clustering Algorithm • Signature Value Calculation • Every peer calculates its signature value, Cat (p), based on the characteristic of images shared by that peer p. • Neighborhood Discovery • Broadcasts a Signature query message to ask for signature values of peers, Peer (p, t) • Similarity Calculation and Attractive Link Establishment • The new peer p can now find other peers with signature values closest to its own value • Makes an Attractive connection to link them up Peer Clustering is done by assigning attractive links; There are mainly three steps in peer clustering strategy… Algorithm 1 Content Based Information Retrieval

  19. Peer Clustering…Illustration Of Peer Clustering • A peer named Tree1 means the majority of images it shares are related to Tree • On joining the network, Tree4 connects to randomly selected peer Sunset4 • It send out a signature query to learn the location and signature value of other peers • After collecting replies from other peers, peer Tree4 makes an attractive link to Tree3 to perform peer clustering. This is because Sim (Tree4,Tree3) is the smallest • Peers of similar characteristics will gradually be connected by an attractive link to form a cluster Content Based Information Retrieval

  20. Peer Clustering…Illustration Of Peer Clustering Figure 3 Content Based Information Retrieval

  21. Peer Clustering…Illustration Of Peer Clustering Table II Content Based Information Retrieval

  22. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  23. Firework Query Model…Illustration Of Firework Query Model Figure 4 Content Based Information Retrieval

  24. Firework Query Model • It is a content based routing strategy • A query message is routed selectively according to the content of the query • In this model, a query message will first walk around the network from peer to peer by random link • Once it reaches the target cluster, the query message is broadcast by peers through the attractive connections inside the cluster To make use of the cluster P2P network, Firework Query Model is proposed Algorithm 2 Content Based Information Retrieval

  25. Firework Query Model…Illustration Of Firework Query Model • First, the features of this query image are extracted and used to calculate the similarity between the query and its own signature value Cat (p) • Since similarity measure between query and its signature value is smaller than a preset threshold, , Tree4 sends query to Sunset4 • On receiving this query Sunset4 carries out two steps • Shared file lookup – The peer looks up its shared image collection for those matching the query • Route selection – The peer calculates the similarity between query and its signature value, which is represented as, Assume peer Tree4 initiates a search to find similar images to its query image, which is an image of Sea Content Based Information Retrieval

  26. Firework Query Model…Illustration Of Firework Query Model (Cont…) Mechanisms used to prevent query messages from Looping: • Replicated Message checking rule • The new Query message is checked against the local cache for duplication • If the message has already passed through before, it is not propagated • Time-To-Live (TTL) • For Random Connections, the probability of decreasing the TTL value is 1 • For Attractive Connections, the probability is an arbitrary value in [0,1] called Chance-To-Survive • This strategy reduces the number of messages passing outside the target cluster • More Information can be retrieved inside the cluster Content Based Information Retrieval

  27. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  28. DISCOVIR • Each peer is responsible to perform feature extraction on its shared images using DISCOVIR Client Program • With this program, each peer maintains is local index of feature vectors of its image collection • When the peer initiates a query by giving an example image and a particular feature extracting method, it sends the feature vector, contained in a query message, to all its connecting peers • Other peers compare this query to their feature vector index • Based on a distance measure, they find a set of similar images and return results back to the requestor • Two types of messages are added: • ImageQuery – It carries the name of the feature extraction method and the feature vector of query method • ImageQueryHit – It contains the location, filename and size of similar image retrieved, and their similarity measure to the query DISCOVIR system is compatible with Gnutella (v0.4) Protocol Content Based Information Retrieval

  29. DISCOVIR…Screen Shot Figure 5 Screen-shot of DISCOVIR Content Based Information Retrieval

  30. DISCOVIR…Type of Messages Figure 6 ImageQuery message Format Figure 7 ImageQueryHit message Format Content Based Information Retrieval

  31. DISCOVIR…Architecture • Connection Manager –Responsible for setting up and managing TCP connection between DISCOVIR clients • Packet Router – Controls the routing, assemble and disassemble messages between the DISCOVIR network and different components of the DISCOVIR client program • Plug-in Manager – Coordinates the download and storage of different feature extraction plug-ins and their interaction with Feature Extractor and Image Indexer • HTTP Agent – It is a tiny Web-Server that handles file download requests from other DISCOVIR peers • Feature Extractor– Collaborates with the Plug-in Manager to perform feature extraction and Thumbnail generation from shared image collection • Preprocessing– Extracts the feature vector of shared images in order to make the collection searchable in the network • Real-Time Extraction– Extracts the feature vector of the query image on the fly and passes the query to packet router • Image Indexer– Indexes the image collection by content feature and carries out clustering to speed up retrieval of images Content Based Information Retrieval

  32. DISCOVIR…Architecture Figure 8 Content Based Information Retrieval

  33. DISCOVIR…Flow Of Operations • Preprocessing • Plug-in Manager module is responsible to query the list of available feature extraction modules on the DISCOVIR control website • Selected Feature extraction modules are downloaded and installed upon users request • The Feature Extractor module extracts features and generates thumbnails for all shared images using a particular feature extraction method needed by user • The Image Indexer module then indexes the image collection using the extracted multidimensional feature vectors • Connection Establishment • Connection Manager module asks the Bootstrap server for peers available for accepting incoming connections • Query Message Routing • The Feature extractor module process the query image and assembles an ImageQuery message to be sent out through the Packet Router module When other peers receives the image query messages, they perform two operations • Query Message Propagation – Packet router module employs checking rules • Local Index Look-up – The peer uses the image indexer module and information in ImageQuery message to search its local index of shared files for similar images. ImageQueryHit message is delivered back to the requestor through Packet Router module once similar images are retrieved • Query Result Display – When an ImageQueryHit message returns to the requestor, it will obtain a list detailing the location and size of matched images. HTTP Agent module downloads (thumbnails, full size image) from peer using HTTP protocol. Content Based Information Retrieval

  34. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  35. DISCOVIR…Performance Metrics • Recall – The success rate of the desired result retrieved where Ra is the number of retrieved relevant documents, R is the total number of relevant documents in P2P network • Query Scope– Fraction of peers being visited by each query where Vpeer is the number of peers that received and handled the query, Tpeer is the total number of peers in P2P netwrok • Query Efficiency – The ratio between the recall and query scope Content Based Information Retrieval

  36. DISCOVIR…Experiment Figure 9 Recall versus Number of peers Content Based Information Retrieval

  37. DISCOVIR…Experiment Figure 10 Query Scope versus Number of Peers Content Based Information Retrieval

  38. DISCOVIR…Experiment Figure 11 Query Efficiency versus Number of Peers Content Based Information Retrieval

  39. Discussion Outline • Introduction • Background • Contribution • Clustering Of P2P Network • Content Based Query Routing • Architecture And Design Of DISCOVIR • Experimental Analysis • Conclusion • Questions Content Based Information Retrieval

  40. Conclusion • We saw implementation of CBIR system over current Gnutella network • Such Architecture fully utilizes the storage and computation capacity of computers in the Internet • To solve the query broadcasting problem they proposed a peer clustering and intelligent query routing strategy to search images efficiently over P2P network • Firework Query Model out performs the BFS method in both network traffic cost and query efficiency measure Content Based Information Retrieval

  41. References • Peer clustering and Firework Query Model byCheuk Hang Ng, Cheung Sia • Efficient Information Retrieval in Peer to Peer Networks by Tang, C., Xu, Z., and Mahalingam • Evaluating Content Based Image Retrieval System by Sharon McDonald, Ting-Sheng Lai, John Tait • Image Search Engines (An Overview) by Th. Gevers and A.W.M Smeulders Content Based Information Retrieval

  42. Content Based Information Retrieval

More Related