Fault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach

Fault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach Sumant Tambe* Akshay Dabholkar Aniruddha Gokhale Abhishek Dubey (Presenter) Institute of Software Integrated Systems (ISIS), Vanderbilt University, Nashville, TN, USA *Contact : sutambe@dre.vanderbilt.edu 12th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2009)

Motivation • Contemporary General-purpose middleware (CORBA, J2EE) • Generic: Well designed for broad applicability • Feature-rich: Supports non-functional properties, such as security, real-time, FT etc. • However, • Does not support domain-specific variations of non-functional semantics out-of-the-box • E.g., Domain-specific fault-tolerance (coming next) • Cost of developing proprietary middleware is high

Motivational Case Study (Material Handling System) • Representative Examples • FedEx, UPS, DHL • Airport Baggage Handling • Food Processing/Bottling • High availability and safety is critical • Communicating software components • Material Flow Control (MFC) • Hardware Interface Layer (HIL) • E.g. Flipper and Motor Controllers

Material Handling System (Fault model and domain-specific recovery) • Hardware and software faults • E.g., Jamming of the flipper • Crash faults of software components • Detected by software components • Communicated using software exceptions • E.g.,FlipperJamException • Communication failure CORBA::COMM_FAILURE • Domain-specific group recovery semantics • Shut down entire primary assembly (F, MC1, MC2) • Start the replica assembly (F’, MC1’, MC2’)

System Design and Implementation Challenges Primary Distributed Processing Unit (DPU) • Lack of middleware abstractions • Failure of one component means failure of all in a DPU • Recover a collection of components simultaneously (even those who don’t have direct exposure to the fault) • Application-level solution is undesirable • Technical concerns is (ideally) no business of applications • Failure handling behavior crosscuts every component in a DPU • Lot of manual programming Backup Distributed Processing Unit (DPU) A A’ B B’ C C’ Problem Statements How to add new semantics in COTS middleware retroactively? How to automate it to improve productivity and reduce cost?

Solution Approach • Promising approach for retroactive behavior augmentation • Aspect-oriented Programming (AOP) • Modularizes domain-specific functionality, potentially cross-cutting • Aspect compiler can weave in the domain-specific behavior into the original fabric of the COTS middleware • Promising approach for automation • Domain-specific modeling (DSM) • Inherently supports tool-driven generation of programming artifacts • Simplifies specification of domain-specific requirements Our Solution GRAFT (GeneRativeAspects for Fault-Tolerance)

GRAFT - Overview • GeneRative Aspects for Fault-Tolerance • A two step process for specializing middleware • Design-time support to specify domain-specific FT requirements • Component Availability Modeling Language (CAML) • Automate FT modeling using a model transformation • Run-time support for middleware specialization • Generate application-specific aspect code for group failover • Weave in generated code in application-specific stubs.

Component Availability Modeling Language (CAML) • Annotate component structural models with fault-tolerance attributes • FT requirements captured using FailOverUnit (FOU) • FOU abstracts away the details of granularity of protection • E.g., Component, Assembly Replica = 2 • Treats a group of components as a single unit of failover • Semantics: If one fails, all fail; Clients failover to replica • Configurable degree of replication • Captures application-specific failure exceptions • E.g., FlipperJamException

Step 1: Automated Structural FT Modeling

Step 2: Automated Aspect Code Generation • Two behaviors based on component role • FOU participant’s behavior • Detects the failure, if any • Shuts down all other participants, including itself • FOU client’s behavior • Detects the failure, if any • Shuts down the FOU • Does an automatic failover to a replica FOU • Generated code:AspectC++ • AspectC++ compiler weaves in the generated code in the respective component stubs

Sample Generated Aspect Code (MFC component) if (failure_count_ > 0) // "_that" is used to change "this" pointer before proceeding. // Use live object reference of the replica. tjp->action()._that = replica_ref_.in(); try { // Continue the flip() function call as usual. tjp->proceed (); break; } catch(HIL::FlipperJamException & e) { handle_exception(e); // deactivates FailOverUnit participants } catch(CORBA::COMM_FAILURE & e) { handle_exception(e); // deactivates FailOverUnit participants } catch(CORBA::TRANSIENT & e) { handle_exception(e); // deactivates FailOverUnit participants } // Application-specific non-catastrophic exceptions are passed. } while (replica_ref_.in() != NULL_POINTER); } }; aspect FailOverUnit_Client { // Auto-generated array of names of FailOverUnit participants. char * FOU_Participants[] = { "FlipperController“, "MotorController1“, "MotorController2“, 0 }; size_t failure_count_; // Initialized to zero. // Contains remote object reference of the replica. HIL::IFlipperController_var replica_ref_; // Weave advice around local stub of the flip() method of MFC. advice execution ("void HIL::IFlipperController::flip()") : around () // The advice is applied around the flip method. { do { // Use the remote reference of the backup FlipperController component only if the primary component has failed.

Run-time Coordination FOU participant detects the failure of another participant Shuts down the primary FOU Deployment infrastructure (DAM) removes the components Clients detect the failure of FOU Clients obtain the replica references from the naming service Successful failover of all the clients

Evaluation of Efforts Reduction (Replica = 2) Fault-tolerance Modeling Efforts Without/With GRAFT Fault-tolerance Programming Efforts Without/With GRAFT

Concluding Remarks • Specializing middleware for non-functional properties is desirable using aspects • Aspect-oriented code can be auto-generated from higher-level domain/application-specific models • Higher to lower-level model transformations, and code generation improves productivity • GRAFT realizes this approach • for fault-tolerant component-based systems built using Component Integrated ACE ORB (CIAO) • using Component Availability Modeling Language (CAML) • using C-SAW and ECL for model transformation • using AspectC++ for aspect-oriented programming www.dre.vanderbilt.edu/cosmic www.cis.uab.edu/gray/Research/C-SAW

Thank you!

Fault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach

Fault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach

Presentation Transcript

Extending ER Diagrams (4.4)

Automated Irrigation Systems: Benefits and Maintenance Issues

Component Software

Rule-based approach in Arabic NLP: Tools, Systems and Resources

XML , Web Services and Middleware

Distributed Systems

Introduction to Middleware Technologies

Fault Tolerance

Part 2: Fault-Tolerance Distributed Systems 2010

Self-Stabilization: An approach for Fault-Tolerance in Distributed Systems

Messaging, MOMs and Group Communication

Outline for 1 and 2

KIP - ASVT

A Treatment-Based Classification Approach to Low Back Pain

Chapter 2 SSADM Methodology, Basic Principles of SSADM and Automated Systems

Algorithm-Based Fault Tolerance Theory of Check Placement

Communication in Distributed Systems

Session 1384: Using OpenVMS Clusters for Disaster Tolerance

International Day for Tolerance

Epidemics