Distributed Systems Major Design Issues

Distributed Systems Major Design Issues Meng Han Presentation 09/11/2013 CS8320 – Advanced Operating Systems Fall 2013 – Section 2.6 Presentation

Outline • Introduction • Distributed System Design Issues • Object Models and Naming Schemes • Distributed Coordination • Interprocess Communication • Distributed Resources • Fault Tolerance and Security • Design for big-data • Summary • References

Introduction • A distributed system mainly consists[1]: • Coordination of concurrent distributed processes • Management of distributed resources • Functioning of distributed algorithms • However… • Network may be UNRELIABLE • Components may be UNTRUSTED • These raise the design and implementation issues, in particular how to support transparency.

Introduction • design and implementation issues: • How to model and identify objects in system • How to co-ordinate the interaction among objects • How to communicatewith each other • How to shared/replicated objects be managed in controlled fashion • How to protect objects and security of system

Object Models and Naming Schemes • Objects in a computer system: • processes, data files, memory, devices, processors, and networks. • Objects are encapsulated in servers • process servers, file servers, memory servers etc. • aclient is a null server that accesses object servers.

Object Models and Naming Schemes • Identify a server[2] • by name (name server) • by either physical or logical address (network server) • by service that the servers provide • Following all depend on the naming scheme for system objects: • Structure of the system, management of name space, name resolution, access methods

Distributed Coordination • Coordination to achieve synchronization • Different types of synchronization: • Barrier synchronization • Process must reach a common synchronization point before they can continue • Conditioncoordination • process must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution • Mutual exclusion • Concurrent processes must have mutual exclusion when accessing a critical shared resource

Synchronization Issues • State information sent by messages: • Typically only partial state information is known about other processes making synchronization difficult. • Information not current due to transfer time delay. • Decision if process may continue must rely on a message resolution protocol. • Centralized Coordinator:Central point of failure • Deadlocks[3] • Circular Waiting for the other process • Deadlock detection and recovery strategies

Synchronization Issues • Deadlocks • Four conditions must hold for deadlock to occur • Exclusive use • Hold and wait • No preemption • Cyclical wait • The problem of deadlocks can be handled in following ways • Prevention, avoidance and detection

Deadlock Prevention • Schemes that guarantee the deadlocks can never happen because of the way the system is structured. • One of the four conditions is prevented, thus preventing deadlocks. • For example, to impose an order on the resources and require processes to request resources in increasing order. This prevents cyclical wait and thus makes deadlocks impossible.

Interprocess Communication • Lower level: • Interprocess communication can be accomplished by using simple message passing primitives. • Higher level: • logical communication methods provides the transparency: • Hide the physical details of message passing • Two important concepts : • The client/server model • Remote Procedure Call (RPC)

The Client/Server Model • The client/ server model is a programming example for structuring processes in distributed systems[4]. logical communication request reply actual communication network client server kernel kernel

The RPC Model • The remote procedure call model is similar to that of the local model: • The caller places arguments to a procedure in a specific location (such as a result register). • The caller temporarily transfers control to the procedure. • When the caller gains control again, it obtains the results of the procedure from the specified location. • The caller then continues program execution.

The RPC Model • On the server side, a process is dormant (inactive, sleeping)— • Awaiting the arrival of a call message. • When one arrives, the server process computes a reply that it then sends back to the requesting client. • After this, the server process becomes dormant again.

The RPC Model

Distributed Resources • Load Distribution • multiprocessor scheduling (Static) • load sharing (Dynamic) • Distributed shared memory • Distributed file systems

Load Distribution • Multiprocessor scheduling[5] • Minimize communication overhead with efficient scheduling. • Load sharing • Process migration strategy & mechanism

Distributed File Systems and Distributed Shared Memory • Distributed file systems • Issues are based on a file point of view • Distributed shared memory • Issues are based on a process perception of the system. • The common issues central to them: • Sharing and replication of data

Fault Tolerance and Security • Security threats and failures are both system faults. • The problem of failures can be alleviated if there is redundancy in the system. • The system should transparently handle failures or removal of machines, network links, and other resources withoutloss of data or functionality. • This should hold true for both the system itself and for its applications.

Fault Tolerance and Security • Security[6] • Authentication-- clients and also servers and messages must be authenticated. • Authorization-- access control has to be performed across a physical network with heterogeneous components under different administrative units using different security models.

Design for BIG-DATA • Emergence of Big Data • Big data is a foundational element of social networking and Web 2.0-based information companies. The enormous amount of data is generated as a result of democratization and ecosystem factors such as the following: • • Mobility trends • • Data access and consumption • •Ecosystem capabilities

Design for BIG-DATA • •Mobility trends: • Mobile devices, mobile events and sharing, and sensory integration • • Data access and consumption: • Internet, interconnected systems, social networking, and convergent interfaces and access models • •Ecosystem capabilities: • Major changes in the information processing model and the availability of an open source framework; the general-purpose computing and unified network integration

Design for BIG-DATA

Summary • Given the system architectures, we summarized the important design and implementation issues. • These issues include object models and naming schemes, interprocess communication and synchronization, data sharing and replication, and failure and recovery. • These problems are unique to distributed systems.

References [1] Randy Chow & Theodore Johnson, 1997, “Distributed Operating Systems & Algorithms”, (Addison-Wesley), p. 45 to 50, 61 to 63. [2] Suresh Sridharan, 2006, “Distributed Operating Systems”, (University of Wisconsin, Madison). http://pages.cs.wisc.edu/~dusseau/Classes/CS739/Writeups/Survey.pdf [3] Chandy, K. Mani, JayadevMisra, and Laura M. Haas. “Distributed deadlock detection.”ACM Transactions on Computer Systems (TOCS) 1.2 (1983): 144-156.

References [4] Holliday, J., and Amr El Abbadi. “Distributed deadlock detection.”Encyclopedia of Distributed Computing. Kluwer Academic Publishers, Dordrecht (accepted for publication) (2005). [5] Babaoglu, Ozalp, and Keith Marzullo. “Consistent global states of distributed systems: Fundamental concepts and mechanisms.”Distributed Systems 2 (1993): 12. [6] Krishna Sankar, Andrew Balinsky, Darrin Miller, Sri Sundaralingam. (Feb 18, 2005)“EAP Authentication Protocols for WLANs”.

References [7] Bohlouli, Mahdi, et al. “Towards an Integrated Platform for Big Data Analysis.”Integrationof Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer Berlin Heidelberg, 2013. 47-56. [8] Wolf, Marilyn. “Computers as components: principles of embedded computing system design.”Access Online via Elsevier, 2012. [9] Provost, Foster, and Tom Fawcett. “Data Science and its Relationship to Big Data and Data-Driven Decision Making.”Big Data 1.1 (2013): 51-59.

Thank You ! VISA LEADER www.gsu.edu

Distributed Systems Major Design Issues