200 likes | 329 Vues
This paper explores advanced indexing techniques in Peer-to-Peer Distributed Hash Table (DHT) networks, focusing on the mapping between file identifiers and their locations. By using an example of searching for a file like "Starwars.divx" and converting it to a key, it demonstrates how lookup functions can efficiently retrieve file locations. The proposed indexing schemes enhance space efficiency, scalability, and adaptability while ensuring minimal network traffic. The study also evaluates various bibliographic databases and caching strategies to optimize data retrieval in decentralized systems.
E N D
Data Indexing in Peer-to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004
DHT • Structure P2P • Distributed Hash Table • mapping between the file identifier and location Ex: • Search for file "Starwars.divx“ • Convert "Starwars.divx" to a key, say "123456789“ • Lookup "123456789" in the DHT, find out the file location • Download the file
Indexing • Indexes don’t contain key-to-data mapping • Indexes provide a key-to-key service, or more precisely a query-to-query service • Ex: Query q A list of more specific queries, covered by q Select a query q If q is the most specific query of a file, returns the file
Maintain • In order to consists of query-to-query mappings, each node: • Insert( q , qi ) function, with q 包含所有的 qi adds a mapping( q ; qi ) to the index of the node responsible for key q • Lookup( q ) function, with q not being the most specific query of a file, returns a list of all the queries qi such there is a mapping(q;qi) in the index of the node responsible for key q
Example: bibliographic database Query-to-key Query-to-Query
Discussion • Some interesting properties of this indexing techniques: • Space efficient • Scalability • Loose coupling between data and indexes • Versatility • Adaptability • Decentralized architecture • Resilient to arbitrary linking
System point of view • Search process should be simple • Amount of network traffic should be minimized • Storage space dedicated to the indexing metadata should remain within reasonable limits.
Evaluation • Distributed Bibliographic Database • Bibliographic database sites: BibFinder http://kilimanjaro.eas.asu.edu NetBib http://edas.info/S.cgi?search=1
Indexing scheme Simple indexing scheme Flat indexing scheme
Indexing scheme Complex indexing scheme
Indexing scheme • Simple: A query for an author or a title returns a set of author and title pairs.The most space-efficient of the three, requiring 152MB of extra storage in the system. • Flat: index query length is always 2.require 37% increase more space. • Complex: some queries in the simple scheme are split into more specific queries.Require 25% increase more space.
Caching • Multi-cache: shortcuts are created on each node along the lookup path. Cache size is unbounded. • Single-cache: shortcuts are created only on the first node that was contacted. Cache size is unbounded. • LRU (least-recently used) : only a limited number of shortcuts can be stored on each node.
Conclusion • Indexing the data stored in the peer-to-peer network. Indexes are distributed across the nodes of the network and contain key-to-key (or query-to-query) mappings. • Given a broad query, a user can look up the more specific queries that match its original query.