Parallel Processing

Parallel Processing Parallel processing involves taking a large task, dividing it into several smaller tasks, and then working on each of those smaller tasks simultaneously. The goal of this divide-and-conquer approach is to complete the larger task in less time than it would have taken to do it in one large chunk.

Basic requirements of parallel computing: Computer hardware that is designed to work with multiple processors and that provides a means of communication between those processors An operating system that is capable of managing multiple processors Application software that is capable of breaking large tasks into multiple smaller tasks that can be performed in parallel

Why Parallel Processing? Duty of a computer is to solve problems faster than the human being. But people wants them to run much more faster. Vendors have tried to solve this problem by improving the circuit design in the past. However, there are physical limitations on this trend of constant improvement. The processing speed of processors depends on the transmission speed of information between the electronic components within the processor. This speed, in turn, is limited by the speed of light.

The speed of processors cannot be increased beyond a certain point. Another limiting factor is that the density of the transistors within a processor can be pushed only to a certain limit. Beyond that limit, the transistors create electromagnetic interference for one another. Therefore parallelism enables multiple processors to work simultaneously on several parts of a task in order to complete it faster than could be done otherwise.

What we gain? Higher throughput More fault tolerance Better price/performance

Parallelism in Databases Three issues are driving the increasing use of parallel processing in database environments: The need for increased speed or performance The need for scalability The need for high availability

Speedup Speedup is defined as the ratio between the runtime with one processor and the runtime using multiple processors. It measures the performance improvement gained using multiple processors instead of a single processor and is calculated using the following formula: Speedup = Time1 / Timem

In an ideal world, the parallel processing speedup would track with the number of processors used for any given task. The ideal speedup curve is rarely reached because parallelism entails a certain amount of overhead. The inherent parallelism of the application also plays an important role in the amount of speedup you can achieve. Some tasks are easily divided into parts that can be processed in parallel. The join of two large tables, for example, can be done in parallel. Other tasks, however, cannot be divided. A nonpartitioned index scan is one such example. If an application has little or no inherent parallelism, then little or no speedup will be achieved.

Scalability is the ability to maintain performance levels as the workload increases by incrementally adding more system capacity. In many applications, the number of database users and the transaction volume are both likely to increase over time. The demand for added processing power to handle the increased load, without the loss of response time, can be met by using parallel systems. Scalability

High Availability Databases are used in the mission-critical applications in organizations such as stock exchanges, banks, and airlines. Many database applications are expected to be available 24 hours a day. Running parallel databases on a multinode parallel system is one way to provide high availability. With a parallel database, when one node goes down, it affects only the subset of users connected to the failed node; moreover, users of the failed node still can access the database after switching to one of the surviving nodes.

Price/Performance Economics is another driver toward parallel computing. It costs money to make processors faster. After a certain limit, increasing the processing power on a single CPU system becomes technically very difficult. It is much cheaper to user multi-node clusters with ordinary CPUs.

Types of Parallelism in DBs Database applications can exploit two types of parallelism in a parallel computing environment: inter-query parallelism and intra-query parallelism.

Inter-Query Parallelism Inter-query parallelism is the ability to use multiple processors to execute several independent queries simultaneously

Intra-Query Parallelism Intra-query parallelism is the ability to break a single query into subtasks and to execute those subtasks in parallel using a different processor for each. The result is a decrease in the overall elapsed time needed to execute a single query. Intra-query parallelism is very beneficial in decision support system (DSS) applications, which often have complex, long-running queries.

Famous DBs Which Support Parallelism Oracle MySQL MSSQL Many others...

Oracle Approach Oracle's support for parallel processing can be divided into the following two specific feature sets: Parallel execution Refers to intra-query parallelism Parallel server Refers to the use of multiple instances to open a single, shared database

Oracle – Parallel Execution • Oracle's parallel execution features enable Oracle to divide a task among multiple processes in order to complete the task faster. This allows Oracle to take advantage of multiple CPUs on a machine.

Oracle Parallel Server Oracle Parallel Server (OPS) enables one database to be mounted and opened concurrently by multiple instances. Each OPS instance is like any standalone Oracle instance and runs on a separate node having its own CPU and memory. The database resides on a disk subsystem shared by all nodes. OPS takes parallelism to a higher plane by allowing you to spread work not only over multiple CPUs, but also over multiple nodes.

Oracle Parallel Server

MySQL Approach MySQL approach is MySQL cluster. MySQL Cluster is a technology which provides shared-nothing clustering capabilities for the MySQL database management system. It was first included in the production release of MySQL 4.1 in November 2004. It is designed to provide high availability and high performance, while allowing for nearly linear scalability.

MySQL Cluster allows datasets larger than the capacity of a single machine to be stored and accessed across multiple machines. MySQL Cluster maintains all indexed columns in distributed memory. Non indexed columns can also be maintained in distributed memory or can be maintained on disk with an in-memory page cache. Storing non indexed columns on disk allows MySQL Cluster to store datasets larger than the aggregate memory of the clustered machines. Hybrid Storage

Shared nothing MySQL Cluster is designed to have no single point of failure. Provided that the cluster is set up correctly, any single node, system, or piece of hardware can fail without the entire cluster failing. Shared disk (SAN) is not required. The interconnects between nodes can be standard Ethernet. Gigabit Ethernet and SCI interconnects are also supported.

That's All References http://www.mysql.com http://www.wikipedia.org http://www.jurriaanpersyn.com http://www.bigresource.com http://www.oreilly.com http://www.dba-oracle.com

Parallel Processing