Understanding Parallel/Scalable Databases: Architecture and Query Processing

Parallel DatabasesMichael French, Spencer Steele, Jill Rochelle When Parallel Lines Meet by Ken Rudin (BYTE, May 98)

What are Parallel/Scalable Databases? • Parallel/Scalable Databases: • Hardware Architecture Multiple Processors Multiple Disk Drives Large Memory Banks • Software Architecture Capable of processing parallel queries Data shipping capabilities

What makes Parallel Databases different from previous technologies?

Previous Technology • Hardware Single processor Small Disk Capacity Less Memory • Software Sequential Queries No partitioning of queries

Parallel Query: • A Query that partitions information to multiple processors and also has the ability to pipeline information

Information Partitioning • Divide the information into smaller tasks • Can have multiple meanings: • Distribution of info to multiple CPUs • Division of hard drive space to contain certain parts of the data

Information Partitioning 2

Information Pipelining • Allows separate processors to work on separate stages of a query • Scan • Join • Sort • Concept is akin to assembly line idea • Allows multiple queries to run at the same time

Information Pipelining 2

Sequential Query Example • Two Tables with 20 million rows each run on a uniprocessor machine • To perform scan, join & sort, query takes 12 mins. • Add partitioning • Query takes 3 mins. • Add Pipelining • 12 queries can be run in 12 mins.

Parallel Kinds • Share-Everything • Hardware • Software • Share-Disk • Hardware • Software • Share-Nothing • Hardware • Software

Conclusion • Pros • Allows you to process more information • Provides for faster processing of queries • Cons • Expensive hardware & software • Much higher maintenance • Is a parallel database right for your organization?

Understanding Parallel/Scalable Databases: Architecture and Query Processing