120 likes | 284 Vues
A 100 Terabyte Database System for $500k. Brewster Kahle President, Alexa Internet Director, Internet Archive. “Hillis’s Law”. Price follows Volume Corollary: Reliability follows Volume Corollary: Availability follows Volume
E N D
A 100 Terabyte Database System for $500k Brewster Kahle President, Alexa Internet Director, Internet Archive
“Hillis’s Law” • Price follows Volume • Corollary: Reliability follows Volume • Corollary: Availability follows Volume • For Reliability and Availability … buy inexpensive components
“Database” defined • Queryable, updateable, persistent data collection • This example: • 10billion WWW pages from 1996 till now • Metadata associated with those pages and websites • Various auxiliary tables and programs • Queryable: Retrieval of records based on fuzzy matches, of date, URL, other attributes
Approach to Large Scale Computation • Need: Scale, Reliability, Flexibility, Evolution, Low Risk • Solution: • Commodity Hardware • Commodity Operating Systems • Commodity Software • Commodity Programmers
Hardware • Homogenous machines leads to quick response through reallocation • HP desktop machines, 320MB RAM, 3u high, 4 100GB IDE Drives • $4k/TB (street), 2.5processors/TB, 1GB RAM/TB • 3 weeks from ordering to operational
Networking • HP Procurve 100baseT switches • About $40/port (street) • Load balancing by DNS round-robin, Cisco, Program • Network booting, so OS is re-installed on every boot • T3 to the Internet for $300/megabit/month
Disk as Tape • Tape is unreliable, specialized, slow, low density, not improving fast, and expensive • Using removable hard drives to replace tape’s function has been successful • When a “tape” is needed, the drive is put in a machine and it is online. No need to copy from tape before it is used. • Portable, durable, fast, media cost = raw tapes, dense. Unknown longevity: suspected good. Think “HAL” rather than StorageTek (Idea by Jim Gray of Microsoft)
Backup: 3 scenarios • Disaster Recovery: Preservation through Replication • Hardware Faults: different solutions for different situations • Clusters, • load balancing, • replication, • tolerate machine/disk outages • (Avoided RAID and expensive, low volume solutions) • Programmer Error: slow replication, timestamped duplicates
Operating System Choices • Need: supportable, clusterable, improving, good support • Commodity, Remote operation of hundreds of nodes, free, source code (for documentation and inhouse fixes) • Reality of Evolution • Integrated Solaris/x86, FreeBSD, Linux • Solaris does not support IDE well, • FreeBSD does not thread well, • Linux does not NFS well, but has momentum • Linux is now our lead OS
Parallel Execution Model • Datamining with command line interface • Controlling machines with 2TB of free space dispatches commands and data to parallel machines • Use flat files • Build explicit indexes • Use “sort” in datamining, Use binary searching for random access • P2 “grep pdf *.cdx | cut –fDATE|sort” –c “sort -m | uniq –c” –p $ARCHIVE • Non Programmers become parallel dataminers in less than 2 weeks
Performance • 500 queries/second on 100GB database • Queries on one key, uses about 10 tables • On 6 computers 2 database machines, 4 front ends • $20,000 (would be less today, but they are older and have 4GB RAM) • 10 queries/second on 100TB database • Index is on 16 computers, data is on 200 computers • 2 query types • $16,000 for index machines, $400k for all machines • General queries vary in speed • The “unit” is the $500 PC for added speed or capacity
Suggestions • Reconsider purchases from: • Oracle, • EMC, • Sun, IBM, HP, Compaq, Dell • Veritas, • Legato, • Exodus, • Your ISP, • Cisco • Our systems scale up well, are reliable, and are flexible because… They are inexpensive.