Challenges in Parallel and Distributed Systems

COMP28112 Lecture 2 A few words about parallel computing Challenges (and goals) when developing distributed systems COMP28112 Lecture 2

Why would you need 100 (or 1000 for that matter) networked computers? What would you do? COMP28112 Lecture 2

Increased Performance • There are many applications (especially in science) where good response time is needed. • How can we increase execution time? • By using more computers to work in parallel • This is not as trivial as it sounds and there are limits (we’ll come back to this) • Parallel computing • Examples: • Weather prediction; Long-running simulations; … COMP28112 Lecture 2

How do we speed up things? Many scientific operations are inherently parallel: for(i=1;i<100;i++) A[i]=B[i]+C[i] for(i=1;i<100;i++) for(j=1;j<100;j++) A[i,j]=B[i,j]+C[i,j] Different machines can operate on different data. Ideally, this may speedup execution by a factor equal to the number of processors (speedup = ts / tp = sequential_time / parallel_time) COMP28112 Lecture 2

Parallel Computing vs Distributed Computing • Parallel Computing: • Multiple CPUs in the same computer • Distributed Computing: • Networked Computers connected by a network. • But don’t use the distinction above as a universal rule: in practice, it is not clear where exactly we draw the line between the two. • Some of the problems are common, but with a different flavour… COMP28112 Lecture 2

An example: IBM Deep Blue • A 30 node parallel computer. • In May 1997 the machine won a 6-game match of chess against the then world champion. • The machine could evaluate about 200 million positions per second. This allowed to evaluate up to 40 turns in advance. • In parallel computing, we need to deal with: • Communication (parallel computers may need to send data to each other) • Synchronisation (think about the problems with process synchronisation from COMP25111: Operating Systems). COMP28112 Lecture 2

Not all applications can be parallelised! x=3 x=3 y=4 y=x+4 z=180*224 z=y+x w=x+y+z w=x+y+z What can you parallelise? If the proportion of an application that you can parallelise is x (0<x<1), then the maximum speed up you expect from parallel execution is: 1/(1-x) The running time, tp, of a programme running on p CPUs, which when running on one machine runs in time ts, will be equal to: tp=to+ts*(1-x+x/p), where to is the overhead added due to parallelisation (e.g., communication, synchronisation, etc). (Read about Amdahl’s law) COMP28112 Lecture 2

How to parallelise an application • Try to identify instructions that can be executed in parallel. These instructions should be independent of each other (the result of one should not depend on the other) • Try to identify instructions that are executed on multiple data; different machines operate on different data. These can be schematically represented – see graph! COMP28112 Lecture 2

Challenges / Goals with distributed systems • Heterogeneity • Openness • Security • Scalability • Failure Handling • Concurrency • Transparency COMP28112 Lecture 2

Challenge: Heterogeneity • Heterogeneity (=the property of consisting of different parts) applies to: • Networks, Computers, Operating Systems, Languages, … • Data types, such as integers, may be represented differently… • Application program interfaces may differ… • Middleware: a software layer that provides a programming abstraction to mask the heterogeneity of the underlying platforms (networks, languages, H/W, …) • E.g., CORBA, Java RMI • It may hide the fact that messages travel over a network in order to send a remote method invocation request… Not necessarily good! COMP28112 Lecture 2

Challenge: Openness • Openness is the property of a distributed system such that each subsystem is continually open to interaction with other systems. As a result the key software interfaces of the components of such a system are made publicly available. • E.g., Web Services (specified by W3C) to support interoperable machine to machine interaction over a network. • Open distributed systems meet the following: • Once something is published, it cannot be taken back • There is no central arbiter of truth – different subsystems have their own. • Unbounded non-determinism: a property of concurrency by which the amount of delay in servicing a request can become unbounded, while still guaranteeing that the request will eventually be serviced (interesting in the development of theories of concurrency). COMP28112 Lecture 2

Challenge: Security • Three aspects: • Confidentiality (protection against disclosure to unauthorized individuals) • Integration (protection against alteration or corruption) • Availability (protection against interference with the means to access the resources) • Encryption techniques (cryptography) answer some of the challenges. Still: • Denial of Service Attacks: A service is bombarded by a large number of pointless requests. • Security of Mobile Code. COMP28112 Lecture 2

Challenge: Scalability • A system is described scalable if it will remain effective, without disproportional performance loss, when there is an increase in the number of resources, the number of users, or the amount of input data. • Scalability in terms of: load, geographical distribution, number of different organisations… • Challenges: • Control the cost of physical resources and performance loss • Prevent software resources running out (32 vs 128-bit Internet addresses) • Avoid performance bottlenecks: use caching, replication,… COMP28112 Lecture 2

Challenge: Failure (Fault) Handling • Some components may fail while others continue executing. • We need to: • Detect failures: e.g., checksums may be used to detect corrupted data. • Mask failures: e.g., retransmit a message when it failed to arrive. • Tolerate failures: do not keep trying to contact a web server if there is no response. • Recover from failures: make sure that the state of permanent data can be recovered (or rolled back) after a server has crashed. • Build redundancy. Data may be replicated to ensure continuing access. If the chance of failure for a single database is p, then the chance of failure of two databases at the same time is p2, of all three databases p3, and so on… Even if p=0.3, p2=0.09, p3=0.027 COMP28112 Lecture 2

Challenge: Concurrency • Several clients may attempt to access a shared resource at the same time. • E.g., ebay bids • Requires proper synchronisation to make sure that data remains consistent. • We have seen the principles of this! • (refer to COMP25111 and process/thread synchronisation) COMP28112 Lecture 2

Challenge: Transparency • Transparency means that any distributed system should hide its distributed nature from its users appearing and functioning as a normal centralised system. • There are many types of transparency: • Access transparency: resources are accessed in a single, uniform way (regardless of whether they are local or remote). • Location transparency: users should not be aware of where a resource is physically located. • Concurrency transparency: multiple users may compete for and share a single resource: this should not be apparent to them. • Replication transparency: even if a resource is replicated, it should appear to the user as a single resource (without knowledge of the replicas). • Failure transparency: always try to hide any faults. • …and more: mobility (migration/relocation), performance, scaling, persistence, security (see: http://en.wikipedia.org/wiki/Transparency_(computing) )

In Summary... • Several challenges / goals: • Heterogeneity, openness, security, scalability, failure handling, concurrency, transparency. • Parallel computing: not really a topic for this module; some of the problems are common, but the priorities are different. • Reading: • Coulouris, Section 1.4, 1.5 (all of Chapter 1 covered). • Tanenbaum, Section 1.2 (all of Chapter 1 covered) • Next time: Architectures and Models. COMP28112 Lecture 2

Challenges in Parallel and Distributed Systems

Challenges in Parallel and Distributed Systems

Presentation Transcript

Lecture 2-2

Lecture 2

COMP28112 – Lecture 20

Lecture 2

COMP28112 The Integration Game

LECTURE 2

Lecture-2

COMP28112 – Lecture 14 Replication

Lecture 2

COMP28112 Lecture 2

COMP28112 – Lecture 15 Replication

COMP28112 – Lecture 19 The Grid (and Cloud Computing)

COMP28112 The Integration Game

COMP28112 – Lecture 14 The Quest for Performance (lab exercise 3)

COMP28112 – Lecture 16 The quest for performance (lab exercise 3)

COMP28112 Lecture 2

COMP28112 – Lecture 16 The Quest for Performance (lab exercise 3)

COMP28112 – Lecture 20 Some problems and more…

LECTURE 2