Innovative Indexing in Peer-to-Peer DHT Networks for Efficient Data Retrieval

Data Indexing in Peer-to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004

DHT • Structure P2P • Distributed Hash Table • mapping between the file identifier and location Ex: • Search for file "Starwars.divx“ • Convert "Starwars.divx" to a key, say "123456789“ • Lookup "123456789" in the DHT, find out the file location • Download the file

Indexing • Indexes don’t contain key-to-data mapping • Indexes provide a key-to-key service, or more precisely a query-to-query service • Ex: Query q A list of more specific queries, covered by q Select a query q If q is the most specific query of a file, returns the file

Maintain • In order to consists of query-to-query mappings, each node: • Insert( q , qi ) function, with q 包含所有的 qi adds a mapping( q ; qi ) to the index of the node responsible for key q • Lookup( q ) function, with q not being the most specific query of a file, returns a list of all the queries qi such there is a mapping(q;qi) in the index of the node responsible for key q

Example: bibliographic database Query-to-key Query-to-Query

Discussion • Some interesting properties of this indexing techniques: • Space efficient • Scalability • Loose coupling between data and indexes • Versatility • Adaptability • Decentralized architecture • Resilient to arbitrary linking

System point of view • Search process should be simple • Amount of network traffic should be minimized • Storage space dedicated to the indexing metadata should remain within reasonable limits.

Evaluation • Distributed Bibliographic Database • Bibliographic database sites: BibFinder http://kilimanjaro.eas.asu.edu NetBib http://edas.info/S.cgi?search=1

Indexing scheme Simple indexing scheme Flat indexing scheme

Indexing scheme Complex indexing scheme

Indexing scheme • Simple: A query for an author or a title returns a set of author and title pairs.The most space-efficient of the three, requiring 152MB of extra storage in the system. • Flat: index query length is always 2.require 37% increase more space. • Complex: some queries in the simple scheme are split into more specific queries.Require 25% increase more space.

Probability vs. Ranking

Caching • Multi-cache: shortcuts are created on each node along the lookup path. Cache size is unbounded. • Single-cache: shortcuts are created only on the first node that was contacted. Cache size is unbounded. • LRU (least-recently used) : only a limited number of shortcuts can be stored on each node.

Average number of interactions required to find data.

Average network traffic (bytes) generated per query.

Cache efficiency: distributed hit ratio.

Conclusion • Indexing the data stored in the peer-to-peer network. Indexes are distributed across the nodes of the network and contain key-to-key (or query-to-query) mappings. • Given a broad query, a user can look up the more specific queries that match its original query.

Innovative Indexing in Peer-to-Peer DHT Networks for Efficient Data Retrieval

Innovative Indexing in Peer-to-Peer DHT Networks for Efficient Data Retrieval

Presentation Transcript

Peer-To-Peer Networks

Peer-to-Peer Networks

Peer-to-Peer Networks - Skype

Efficient Search in Peer to Peer Networks

Complex Queries in DHT-based Peer-to-Peer Networks

Peer-to-Peer Networks

Data Management in Mobile Peer-to-Peer Networks

Peer-to-peer networks

Peer to Peer Networks

Peer-to-Peer (P2P) Networks

Peer-to-Peer Networks

Disrupting Peer-to-Peer Networks

Streaming in Peer-to-peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Peer-to-Peer Networks

Peer-to-Peer Networks

Adaptations to Peer to Peer Networks

Peer-to-peer networks