1 / 53

Routing Indices For P-to-P Systems

Routing Indices For P-to-P Systems. ICDCS 2002. Introduction. Search in a P2P system Mechanisms without an index Mechanisms with specialized index nodes (centralized search) Mechanisms with indices at each node Structure P2P network Unstructure P2P network

lei
Télécharger la présentation

Routing Indices For P-to-P Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Routing Indices For P-to-P Systems ICDCS 2002

  2. Introduction • Search in a P2P system • Mechanisms without an index • Mechanisms with specialized index nodes (centralized search) • Mechanisms with indices at each node • Structure P2P network • Unstructure P2P network • Parallel v.s. sequentially search • Response time • Network traffic

  3. Routing indices(RI) • Query • Documents are on zero or more “topics”, and queries request documents on particular topics. • Documents topics are independent • Local index • RI • Each node has a local routing index which contains following information • The number of documents along each path • The number of documents on each topic of interest • Allow a node to select the “best” neighbors to send a query to

  4. The RI may be “coarser” than the local indices • overcounts • Undercounts

  5. Goodness measure • Number of results in a path • Using Routing indices

  6. Storage space • N: number of nodes in the P2P network • b: branching factor • c: number of categories • s: counter size in bytes Centralized index : s*( c+1) *N Distributed system: s*(c+1)*b (each node)

  7. Creating routing indices

  8. Maintaining Routing Indices • Trade off between RI freshness and update cost • No requiring the participation of a disconnecting node • Discussion • If the search topics is dependent? • Can the number of “hops” necessary to reach a document be estimated?

  9. Alternative Routing Indices • Hop-count RI • Aggregated RIs for each “hop” up to a maximum number of hops are stored

  10. Search cost • Number of messages • The goodness of a neighbor • The ratio between the number of documents available through that neighbor and the number of messages required to get those documents • Regular tree with fanout F • It takes Fh messages to find all documents at hop h • Storage cost?

  11. Exponentially aggregated RI • Store the result of applying the regular-tree cost formula to a hop-count RI • How to compute the goodness of a path for the query containing several topics?

  12. Cycles in the P2P network (HW)

  13. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina

  14. Outline • Introduction • Techniques • Experiment

  15. Introduction • We present three techniques for efficient search in P2P systems. • Basic idea is to reduce the number of nodes that process a query

  16. Current Techniques • Gnutella • BFS with depth limit D. • Waste bandwidth and processing resources • Freenet • DFS with depth limit D. • Poor response time.

  17. Iterative Deepening • Under policy P= { a, b, c} ;waiting time W • See example.

  18. Directed BFS • A source send query messages to just a subset of its neighbors • A node maintains simple statistics on its neighbors • Number of results received from each neighbor • Latency of connection

  19. Candidate nodes • Returned the Highest number of results • Low hop-count • High messages

  20. Local Indices • Each node n maintains an index over the data of all nodes within r hops radius. • All nodes at depths not listed in the policy simply forward the query. • Example: policy P= { 1, 5}

  21. Experimental Setup • For each response ,we log: • Number of hops took • IP from which the Response message came • Response time • Individual results

  22. Experimental result

  23. Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems Kunwadee Sripanidkulchai Bruce Maggs Hui Zhang IEEE INFOCOM 2003

  24. motivation • Although flooding is simple and robust, it is not scalable. • A content location solution in which peers organized into an interest-based structure on top of Gnutella. • The algorithm is called interest-based shortcuts

  25. Interest-based locality

  26. Shortcuts Architecture and Design Goals • To create additional links on top of a peer-to-peer system’s overlay • As a separate performance enhancement layer on top of existing content location mechanisms

  27. Content location paths

  28. Shortcut Discovery • The first lookup returns a set of peers that store the content • These are potential candidates. • One peer is selected at random from the set and added • For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.

  29. Shortcut selection • We rank shortcuts based on their perceived utility • A peer sequentially asking all of the shortcuts on its list.

  30. Ranking metrics • Probability of providing content • Latency of the path to the shortcut • Load at the shortcut • A combination of metrics can be used based on each peer’s preference

  31. Performance indices • Success rate • Load characteristics • Query scope • Minimum reply path lengths • Additional state

  32. Potential and Limitations • Adding 5 shortcuts at a time produces success rates that are close to the best possible. • Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

  33. Conclusion • A simple and practical mechanism was proposed.

  34. Similarity Discovery in structured P2P Overlays ICPP

  35. Introduction • Structured P2P network • Only support search with a single keyword • Similarity between two documents • Keyword sets • Vector space • Measure • Problems • Search problem • New keyword?

  36. Meteorograph • Absolute angle

  37. Publishing and Searching • Publish • Hash • Publish the item to a node np with the hash key closest to hash value

  38. Search problem • Nearest answers • K_nearest answers • e • Partial • Comprehensive • Search strategy • Discussions • What happened when keyword vector is represented by q?

  39. Other issues • Load balance • Changes of vector space • Republished? • Comprehensive set of keywords • Other methods?

  40. SWAM: A Family of Access Methods for Similarity-Search in Peer-to-Peer Data Networks Farnoush Banaei-Kashani Cyrus Shahabi (CIKM04)

  41. PDN access method • Defines • How to organize the PDN topology to an index-like structure • How to use the index structure

  42. Hilbert space • Hilbert space (V, Lp) • Key k = (a1,a2, … , ad) • d: the dimension of a Vector space • The domain is a contiguous and finite interval of R • The Lp norm with p belongs to Z+ • The distance function to measure the dissimilarity

  43. Topology • Topology of a PDN can be modelled as a directed graph G(N, E) • A(n) is the set of neighbors for node n • A node maintains • A limited amount of information about its neighbors Includes • the key of the tuples maintained at neighbors • The physical addresses of neighbors

  44. The processing of the query is completed when all expected tuples in the relevant result set are visited • Access methods • Join, leave for virtual nodes • Forward for using local information to process queries and make forwarding decisions

  45. The small world example • Grid component • Random graph component • The process of queries (exact, range, kNN) in the highly locality topology

  46. Flat partitioning • SWAM also employs the space partitioning idea: flat partitioning

  47. Query Processing • Exact-Match query processing • Range query processing • kNN Query processing

  48. Data Indexing in Peer-to-Peer DHT Networks ICDCS 2004

More Related