Scalable Distributed Information Management System (SDIMS) for Networked Applications
550 likes | 641 Vues
Introducing SDIMS for system management, service placement, data sharing, and more. Focus on scalability, flexibility, robustness. Explore implementation and evaluation.
Scalable Distributed Information Management System (SDIMS) for Networked Applications
E N D
Presentation Transcript
A Scalable Distributed Information Management System (SDIMS) P. Yalagandula, M. Dahlin cs.utexas.edu SIGCOMM 2004
Outline • Introduction • Goal : Aggregation • Innovation • Flexibility • Scalability • Robustness • Implementation • Evaluation • Conclusions
Introduction • Why SDIMS ? • Monitor, querying, reacting to changes are core components of applications such as system management, service placement, data sharing and caching, etc. • SDIMS in a networked system would provide a distributed operating system backbone and facilitate the development and deployment of new distributed service.
Introduction (cont.) • Fundamental • Hierarchical aggregation • A node access detailed views of nearby information and summery views of global information. • A hierarchical system aggregate information through reduction trees.
Introduction (cont.) • A SDIMS should have four properties. • Scalable • Flexibility • Administrative isolation • Robustness
Scalable • SDIMS should accommodate large numbers of nodes. • SDIMS should allow applications to install and monitor large numbers of data attributes.
Flexibility • SDIMS should accommodate a range of applications and attributes. • Read-dominated attribute (rarely change) • Num of CPUs • Write-dominated attribute (change often) • Num of processes • SDIMS should leave the policy decision of tuning replication to applications.
Administrative isolation • Nodes can be arranged in an organizational or administrative hierarchy. • Domain-based control. • Monitor • Query
Robustness • SDIMS should adapt to reconfigurations in a timely fashion when node failures or disconnections. • SDIMS should provide mechanisms so that applications can tradeoff the cost of adaptation with consistency level of aggregated results when reconfigurations occur.
Related Works • Astrolabe • A single logical aggregation tree that mirrors a system administrative hierarchy. • A general interface for installing new aggregation functions. • An unstructured gossip protocol for disseminating information and replicating all aggregated attribute values for a sub-tree to all nodes in the sub-tree.
Related Works (cont.) • Any nodes can answer queries by using local information. • Not scalable. (replication) • Not flexibility. (Type of attribute) • Solution : P2P Go to DHT
Tree • For each level in the hierarchy, the agent maintains a record with the list of child zones (and their attributes), and which child zone represents its own zone (self). Back to Astrolabe
Gossip protocol • Periodically, each agent selects some other agent at random and exchanges state information with it. • If the two agents are in the same zone, the state exchanged relates to MIBs in that zone. • If the two agents are in different zone, they exchange state associated with the MIBs of their least common ancestor zone. Back to Astrolabe
Related Works (cont.) • DHT • SkipNet, CAN, Pastry, Chord, Tapestry
Problem • How to scalable map different attributes to different aggregation tree in a DHT mesh ?{physical network vs overlay network} • How to provide flexibility in the aggregation to accommodate different application requirement ?{flexible API for installing and controlling system}
Problem ? • How to adapt a DHT mesh to attain administrative isolation property ? {virtual organization} • How to provide robustness without unstructured gossip and total replication ?{cache; pre-computing or on-demand re-aggregation}
Aggregation Abstraction • Each physical node in the system is a leaf in the tree. • An internal non-leaf, which we call virtual node, is simulated by one or more physical nodes at the leaves of the sub-tree for which the virtual node is the root.
Aggregation Abstraction (cont.) • Each physical node has local data stored as a set of (attributeType, attributeName, value) tuples. • The system associates an aggregation function ftype with each attribute type.
Aggregation Abstraction (cont.) • For each level-i sub-tree Ti in the system has an aggregate valueVi, type, name for each (attributeType, attributeName) pair. • The aggregate value for a level-i sub-tree Ti is the aggregate function for the type, ftype computed across the aggregate values of each of Ti‘s k children.Vi, type, name = ftype
Aggregation Abstraction (cont.) • Example of ftype • Avg(V1, …, Vn)=1/n 錯誤 • SUM(V1, …, Vn) = 正確 • Aggregation function satisfy the hierarchical computation property
Aggregation Abstraction (cont.) node Virtual node
Innovation • Flexibility • Scalability • Administrative isolation • Robustness
Flexibility • Operation API • Install • Update • Prob
Install Operation • The Install operation installs an aggregation function in the system.
Prob Operation 使用於強制reconfigure,更新所有cache
Prob Operation (cont.) • When node A issues a continuous probe at level l for an attribute, then updates for the attribute at any node in A’s level-l ancestor’s subtree are aggregated up to level l and is propagated down along the path from the ancestor to A.
Update Operation API • Update-UpK-downj :Up to kth level and propagates the aggregate values of a node at level l downward for j levels. (l ≤ k)
Operation API K Update-UpK-downj Level-4 Level-3 L Level-2 J Level-1 Level-0
Dynamic Adaptation • A SDIMS implementation can dynamically adjust its up/down strategies for an attribute based on its measured read/write frequency.
Scalability • SDIMS defines the aggregation abstraction to mesh with its underlying scalable DHT system. • SDIMS refines the basic DHT abstraction to form an Autonomous DHT (ADHT) to achieve the administrative isolation properties
Mapping to DHT • Aggregating an attribute along the aggregation tree is corresponding to DHTtreek for k =hash(attribute type, attribute name) • Different attributes will be aggregated along different trees.
Administrative isolation • For security • Updates and Probes are not accessible outside the domain • For availability • Queries for values in a domain are not affected by failures of nodes in other domains • For efficiency • Domain-scoped queries can be simple and efficient.
Administrative isolation • Autonomous DHT • Path Locality: Search paths should always be contained in the smallest possible domain. • Path Convergence: Search paths for a key from different nodes in a domain should converge at a node in that domain.
應合併 Administrative isolation Domain univ. Domain dept. L0: host L2: univ. isolation property is violated
Administrative isolation Domain dept. Domain univ. L0: host L2: dept. Autonomous DHT
Robustness • ADHT • Distributed Computing (?) • Aggregation Management Layer (AML) • Lazy re-aggregation • On-demand Re-aggregation • Replication in Space
2 Layer arch. : ADHT and AML • The ADHT layer informs the AML layer about reconfigurations in the network. • NewParent • FailedChild • NewChild
Implementation DifferentOverlay(?)
MIB • Child MIBs containing raw aggregate values gathered from children. • Reduction MIB containing locally aggregated values across this raw information • Ancestor MIB containing aggregate values scattered down from ancestors.
Implementation parent child
Implementation (cont.) • attribute key : Use for retrieving data by aggregation function. • (attributetype, attribute name)
Implementation (cont.) • A node acts • as leaf for all attribute keys • as a level-1 subtree root for keys whose hash matches the node’s ID in b prefix bits. • as a level-i subtree root for keys whose hash matches the node’s ID in the initial i * b bits. • as the system’s global root for attribute keys whose hash matches the node’s ID in more prefix bits than any other node
Evaluation 更新自己的MIB 更新全部Node的MIB Up-All, Down 0 Monitor的attribute變化少 Monitor的attribute變化多
Evaluation (cont.) the session size is set to 8 (domain size), the branching factor is set to 16 Message size nodes
Evaluation (cont.) Bf: Branch Factor Average path length to root
Evaluation (cont.) Bf: Branch Factor