Autonomic Computing: Model, Architecture, Infrastructure

Autonomic Computing:Model, Architecture, Infrastructure Manish Parashar The Applied Software Systems Laboratory Rutgers, The State University of New Jersey http://automate.rutgers.edu Ack: NSF (CAREER, KDI, ITR, NGS), DoE (ASCI) UPP – Autonomic Computing Mt. St. Michel, France, September 15 – 17, 2004

Unprecedented Complexity, Uncertainty … • Very large scales • million of entities • Ad hoc (amorphous) structures/behaviors • p2p/hierarchical architecture • Dynamic • entities join, leave, move, change behavior • Heterogeneous • capability, connectivity, reliability, guarantees, QoS • Unreliable • components, communication • Lack of common/complete knowledge • number, type, location, availability, connectivity, protocols, semantics, etc. UPP, September 15-17, 2004

Autonomic Computing • Our system programming paradigms, methods and management tools seem to be inadequate for handling the scale, complexity, dynamism and heterogeneity of emerging systems • requirements and objectives are dynamic and not know a priori • requirements, objectives and solutions (algorithms, behaviors, interactions, etc.) depend on state, context, and content • Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees • self configuring, self adapting, self optimizing, self healing, self protecting, highly decentralized, heterogeneous architectures that work !!! • The goal of autonomic computing is to build self-managing system address these challenges using high level policies UPP, September 15-17, 2004

Ashby’s Ultrastable System Model of the Human Autonomic Nervous System UPP, September 15-17, 2004

Programming Distributed Systems • A distributed system is a collections of logically or physically disjoint entities which have established a processing for making collective decisions. if (Decision(CurrentState,Request)) then TransitionState(CurrentState,Request) • Central/Distributed Decision & Transition • Programming System • programming model, languages/abstraction – syntax + semantics • entities, operations, rules of composition, models of coordination/communication • abstract machine, execution context and assumptions • infrastructure, middleware and runtime • Conceptual and Implementation Models UPP, September 15-17, 2004

UPP 2004 – Autonomic Computing • Objective: Investigate conceptual and implementation models for Autonomic Computing • Models, Architectures and Infrastructures for Autonomic Computing • Manish Parashar et al. • Grassroots Approach to Self-Management in Large-Scale Distributed Systems • Ozalp Babaoglu et al. • Autonomic Runtime System for Large Scale Applications • Salim Hariri et al. UPP, September 15-17, 2004

Outline • Programming emerging distributed systems • Project AutoMate and the Accord programming system • Sample applications in science and engineering • Conclusion UPP, September 15-17, 2004

Autonomic Computing Architecture • Autonomic elements (components/services) • Responsible for policy-driven self-management of individual components • Relationships among autonomic elements • Based on agreements established/maintained by autonomic elements • Governed by policies • Give rise to resiliency, robustness, self-management of system UPP, September 15-17, 2004

Project AutoMate: Enabling Autonomic Applications(http://automate.rutgers.edu) • Conceptual models and implementation architectures for autonomic computing • programming models, frameworks and middleware services • autonomic elements • dynamic and opportunistic composition • policy, content and context driven execution and management UPP, September 15-17, 2004

Accord: A Programming System for Autonomic Applications • Specification of applications that can detect and dynamically respond during execution to changes in both, the execution environment and application states • applications composed from discrete, self-managing components which incorporate separate specifications for all of functional, non-functional and interaction-coordination behaviors • separations of the specifications of computational (functional) behaviors, interaction and coordination behaviors and non-functional behaviors (e.g. performance, fault detection and recovery, etc.) so that their combinations are composable • separation of policy and mechanism – policies in the form of rules are used to orchestrate a repertoire of mechanisms to achieve context-aware adaptive runtime computational behaviors and coordination and interaction relationships based on functional, performance, and QoS requirements • extends existing distributed programming systems UPP, September 15-17, 2004

Autonomic Elements in Accord • Functional port defines set of functional behaviors provided and used • Control port defines sensors/actuators for externally monitoring and controlling the autonomic element, and a set of guards to control the access to the sensors and actuators • Operational port defines interfaces to formulate, inject and manage rules used to manage the runtime behaviors and interactions of the element • Autonomic element embeds an element manager that is delegated to evaluate and execute rules in order to manage the execution of the element, and cooperates with other element managers to fulfill application objectives. UPP, September 15-17, 2004

Rules In Accord • Behavior rules  manage the runtime behaviors of a component • Interaction rules  manage the interactions between components, between components and environments, and the coordination within an application. • control structure, interaction pattern, communication mechanism • Security rules  control access to the functional interfaces, sensors/actuators and rule interfaces • Conflicts are resolved using a simple priority mechanism IF condition THEN then_actions ELSE else_actions A logic combination of sensors, events, and functional interfaces A sequence of sensors, actuators and functional interfaces UPP, September 15-17, 2004

Dynamic Composition/Coordination In Accord • Relationship is defined by control structure (e.g., loop, branch) and/or communication mechanism (e.g., RPC, shared-space) • composition manager translates workflow into a suite of interaction rules injected into element managers • element managers execute rules to establish control and communication relationships among elements in a decentralized manner • rules can be used to add or delete elements • a library of rule-sets defined for common control and communications relationships between elements. • interaction rules must be based on the core primitives provided by the system. Workflow Manager(s) Interaction rules Interaction rules Interaction rules Interaction rules UPP, September 15-17, 2004

Accord Implementation Issues • Current implementations • C++ + MPI, DoE CCA, XCAT/OGSA • XML used for control/operational ports and rules • common ontology for specifying interfaces, sensors/actuators, rule, content, context, … • timed behavior, fail-stop semantics • of course, these is a performance impact but in our experience this have not been a show stoppers • Accord assumes an execution environment that provides • agent-based control network • supports for associative coordination • service for content-based discovery and messaging, • support of context-based access control • execution environment of the underlying programming system UPP, September 15-17, 2004

Accord Neo-CCA An original Neo-CCA application GoPort usePort providePort providePort usePort Driver component Component B Component A The Neo-CCA based Accord application providePort providePort usePort usePort GoPort Component B Component A Driver component providePort providePort usePort Element Manager Composition Agent UPP, September 15-17, 2004

Accord Neo-CCA A Driver B EM CA Neo-CCA framework A Driver Node y B EM CA A Driver B Neo-CCA framework Node x EM CA Neo-CCA framework Node z UPP, September 15-17, 2004

Accord Application Infrastructure • Rudder Decentralized Coordination Framework • support autonomic compositions, adaptations, optimizations, and fault-tolerance. • context-aware software agents • decentralized tuple space coordination model • Meteor Content-based Middleware • services for content routing, content discovery and associative interactions • a self-organizing content overlay • content-based routing engine and decentralized information discovery service • flexible routing and querying with guarantees and bounded costs • Associative Rendezvous messaging • content-based decoupled interactions with programmable reactive behaviors. • Details in IEEE IC 05/04, ICAC 04, SMC 05 UPP, September 15-17, 2004

Data-Driven Optimization of Oil Production UPP, September 15-17, 2004

AutonomicOilWell Placement (VFSA) Contours of NEval(y,z,500)(10) Pressure contours 3 wells, 2D profile permeability Requires NYxNZ (450) evaluations. Minimum appears here. VFSA solution: “walk”: found after 20 (81) evaluations UPP, September 15-17, 2004

AutonomicOilWell Placement (VFSA) UPP, September 15-17, 2004

Search space response surface: Expected revenue - f(p) for all possible well locations p. White marks indicate optimal well locations found by SPSA for 7 different starting points of the algorithm. Permeability field showing the positioning of current wells. The symbols “*” and “+” indicate injection and producer wells, respectively. AutonomicOilWell Placement (SPSA) UPP, September 15-17, 2004

AutonomicOilWell Placement (SPSA) UPP, September 15-17, 2004

CH4Air/H2Air Simulations • Simulate the chemical reaction with the elements O, H, C, N, and AR under dynamic conditions • CRL/SNL, Livermore, CA • Objective is to use current sensor date and simulation state to choose “best” algorithm the accelerates convergence • i.e., decreases nfe UPP, September 15-17, 2004

Rule Generation for CH4Air Problem UPP, September 15-17, 2004

Rules for CH4Air Problem • IF 1000 <= temperature < 2000 THEN BDF 3 • IF 2000 <= temperature < 2200 THEN BDF 4 • IF 2200 <= temperature < 3000 THEN BDF 3 • IF 3000 <= temperature THEN BDF 3 UPP, September 15-17, 2004

Experiment Results of CH4Air Problem UPP, September 15-17, 2004

Rule Generation for H2Air Problem UPP, September 15-17, 2004

Rules for H2Air Problem • IF 1000 <= temperature < 1200 THEN BDF 2 • IF 1200 <= temperature < 1800 THEN BDF 4 • IF 1800 <= temperature < 2400 THEN BDF 3 • IF 2400 <= temperature THEN BDF 4 UPP, September 15-17, 2004

Experiment Results of H2Air Problem UPP, September 15-17, 2004

Computational Modeling of Physical Phenomenon • Realistic, physically accurate computational modeling • Large computation requirements • e.g. simulation of the core-collapse of supernovae in 3D with reasonable resolution (5003) would require ~ 10-20 teraflops for 1.5 months (i.e. ~100 Million CPUs!) and about 200 terabytes of storage • e.g. turbulent flow simulations using active flow control in aerospace and biomedical engineering requires 5000x1000x500=2.5∙109 points and approximately 107 time steps, i.e. with 1GFlop processors requires a runtime of ~7∙106 CPU hours, or about one month on 10,000 CPUs! (with perfect speedup). Also with 700B/pt the memory requirement is ~1.75TB of run time memory and ~800TB of storage. • Dynamically adaptive behaviors • Complex couplings • multi-physics, multi-model, multi-resolution, …. • Complex interactions • application – application, application – resource, application – data, application – user, … • Software/systems engineering/programmability • volume and complexity of code, community of developers, … • scores of models, hundreds of components, millions of lines of code, … UPP, September 15-17, 2004

Multi-block grid structure and oil concentrations contours (IPARS, M. Peszynska, UT Austin) Blast wave in the presence of a uniform magnetic field) – 3 levels of refinement. (Zeus + GrACE + Cactus, P. Li, NCSA, UCSD) Richtmyer-Meshkov- detonation in a deforming tube - 3 levels. Z=0 plane visualized on the right (VTF + GrACE, R. Samtaney, CIT) Mixture of H2 and Air in stoichiometric proportions with a non-uniform temperature field (GrACE + CCA, Jaideep Ray, SNL, Livermore) A Selection of SAMR Applications UPP, September 15-17, 2004

Self- Observation & Analysis Application State Characterization Dynamic Driver Application Computation/ Communication Application Dynamics Nature of Adaptation Monitoring & Context-Aware Services Natural Region Characterization Current Application State Self-Optimization & Execution Deduction Engine Normalized Work Metric Objective Function Synthesizer Deduction Engine Deduction Engine Prescriptions Resource Monitoring Service Application Monitoring Service Normalized NRM CPU NWM Resource Metric Current System State System Capability Module Memory Autonomic Partitioning Bandwidth Availability Access Policy Partition/Compose Repartition/Recompose System State Synthesizer Virtual Grid Autonomic Runtime Manager VCU Heterogeneous, Dynamic Computational Environment VCU Resource History Module Virtual Computation Unit Performance Prediction Module Autonomic Scheduling Mapping Distribution Redistribution Global Grid Scheduling Local Grid Scheduling VCU Virtual Resource Unit VGSS VGTS VGSS VGTS Execution VGTS: Virtual Grid Time Scheduling VGSS: Virtual Grid Space Scheduling Autonomic Runtime Management UPP, September 15-17, 2004

Autonomic Forest Fire Simulation Predicts fire spread (the speed, direction and intensity of forest fire front) as the fire propagates, based on both dynamic and static environmental and vegetation conditions. High computation zone UPP, September 15-17, 2004

Conclusion • Autonomic applications are necessary to address scale/complexity/heterogeneity/dynamism/reliability challenges • Project AutoMate and the Accord programming system addresses key issues to enable the development of autonomic applications • conceptual and implementation • More Information, publications, software, conference • http://automate.rutgers.edu • automate@caip.rutgers.edu / parashar@caip.rutgers.edu • http://www.autonomic-conference.org UPP, September 15-17, 2004

TASSL Rutgers University Autonomic Computing Research Group Viraj Bhat Nanyan Jiang Hua Liu (Maria) Zhen Li (Jenny) Vincent Matossian Cristina Schmidt Guangsen Zhang Autonomic Applications Research Group Sumir Chandra Xiaolin Li Li Zhang CS Collaborators HPDC, University of Arizona Salim Hariri Biomedical Informatics, The Ohio State University Tahsin Kurc, Joel Saltz CS, University of Maryland Alan Sussman, Christian Hansen Applications Collaborators CSM, University of Texas at Austin Malgorzata Peszynska, Mary Wheeler IG, University of Texas at Austin Mrinal Sen, Paul Stoffa ASCI/CACR, Caltech Michael Aivazis, Julian Cummings, Dan Meiron CRL, Sandia National Laboratory, Livermore Jaideep Ray, Johan Steensland The Team UPP, September 15-17, 2004

Autonomic Computing: Model, Architecture, Infrastructure

Autonomic Computing: Model, Architecture, Infrastructure

Presentation Transcript

Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design

High Throughput Computing

High Throughput Computing

Making the Most of Infrastructure as a Service

Applications of SOA and Web Services in Grid Computing

Group 1

雲端計算 Cloud Computing

Linux I/O

Introduction to Grid Computing and the Globus Toolkit™

CSE503: Software Engineering Software architecture

Physical Infrastructure

Universal laws and architecture: Challenges for Sustainable Infrastructure

V615 CeBS 1.x - Detailed Architecture

What is Mobile Computing? Wireless Communication Systems Mobile Communication Systems Architecture

High Throughput Computing

Software Architecture Chapter 2: Architectural Patterns

The autonomic nervous system

NEUROANATOMY Lecture : 12 Anatomy of the Autonomic Nervous System (Involuntary Nervous System)

High Performance Cluster Computing

16 Neural Integration II: The Autonomic Nervous System and Higher-Order Functions