540 likes | 666 Vues
In today's digital landscape, effectively managing data across millions of connected devices poses unique challenges. This presentation by James Hamilton discusses essential components for a robust client-tier architecture, addresses potential database issues, and explores the implications of administrative and development costs. As user numbers rapidly grow—from an estimated 131 million in 1998 to 515 million connected devices—understanding the interconnection of TVs, appliances, and more is crucial. Covering standards like IEEE 1394 and Universal Plug and Play, Hamilton provides insights on evolving device interconnectivity infrastructure that should inform future data management strategies.
E N D
Data Management in a Highly Connected World James Hamilton JamesRH@microsoft.com Microsoft SQL Server March 3, 2000
Agenda • Client Tier • Number of devices • Device interconnect fabric • Standard programming infrastructure • Client tier database issues • Resource requirements • Implementation language • Administrative cost implications • Development cost implications • Middle Tier • Server Tier • Summary
How Many Clients? • 1998 US WWW users (IDC) • US: 51M; World wide: 131M • 2001 estimates: • World Wide: 319M users • 515M connected devices • ½ billion connected Clients • Conservative estimate based upon conventional device counts
Other Device Types • TVs, VCRs, stoves, thermostats, microwaves, CD players, computers, garage door openers, lights, sprinklers, appliances, driveway de-icers, security systems, refrigerators, health monitoring, etc. • Sony evangelizing IEEE 1394 Interconnect • http://www.sel.sony.com/semi/iee1394wp.html • Microsoft & consortium evangelizing Universal Plug & Play • www.upnp.org • WAP: Wireless Application Protocol • http://www.wap.net/
Device Interconnect Infrastructure • Power line control • X10: http://www.x10.org • Sunbeam Thalia: • http://www.thaliaproducts.com/
Why Connect These Devices? • TV guide & auto VCR programming • CD label info & song list download • Sharing data & resources • Set clocks (flashing 12:00) • Fire and burglar alarms • Persist thermometer settings • Feedback & data sharing based systems: • Temperature control & power blind interaction • Occupancy directed heating and lighting
Device Communication Implications • The need is there • Infrastructure is going in: • Wireless • Power line communications • Unused twisted pair (phone) bandwidth • Connectable devices & infrastructure arriving & being deployed • On order of billions of client devices
Device interconnect Example Den Windows NT Server Ethernet Hub 56k bps line Ethernet Backbone Deck Filtration Plant 130 Gallon F/W Aquarium Living Room 660 Gallon Marine Aquarium X10 Backbone 130 Gallon F/W Aquarium Home Sprinklers Bedroom
Improvements For Example • Cooperation of lighting, A/C and power blind systems • Alarms and remote notification for failures in: • Circulations pump • Heating & cooling • Salinity & other water chemistry changes • Filtration system • Feedback directed systems
Palmtop Resource Trends • Palmtops I’ve purchased through the years • All about same cost & physical size PalmtopRAM Moore’s Law 100 Casio E105 (32M) 32M HP 200LX (2M) 10 HP 100LX (1M) Everex A20 (4m) HP 95LX (0.5M) Sharp IQ8300 (0.25M) Sharp IQ7000 (0.125M) 1 0.1 1990 1992 1994 1996 1998 2000 2002
O/S Memory Requirements • Windows Memory requirements over time Desktop RAM Moore’s Law 128m 100 Windows 2000(64M) Windows98 (16M) 10 Windows95 (4M) WFW 3.1 (3M) Windows 3.0 (2M) 1 Windows 2.0 (512K) Windows 1.0 (256K) 0.1 1985 1987 1989 1991 1993 1995 1997 1999
Smartcard Resource Trends 300 M 1 M Memory Size (Bits) 10 K You are here 3 K 1990 1992 1996 1998 2000 2002 2004 Source: Denis Roberson PIN/Card -Tech/ NCR
Devices Smaller Than PDAs • Qualcomm PDQ • 2 MB total memory • Same mem curve as PDAs…just 2 to 5 years behind • Nokia 9000il • 8 MB total Memory
Resource Trend Implications • Device resources at constant cost are growing at super-Moore rates • Same but 2 to 3 yrs behind desktop system growth • Same is true of each class of devices • Telephones trail PDAs but again grow at the same rate • Memory growth is not the problem • However devices always smaller than desktops • Devices more specialized so resource consumption less … can still run standard vertical app slice
Standard Infrastructure at Client • Clearly specialized user interface S/W needed • But we have the memory resources to support: • Standard communications stack (TCP/IP) • Standard O/S software • Standard data management S/W with query • Transparent replication • Symmetric multi-tiered infrastructure S/W: • Leverage best development environments • No need to rewrite millions of redundant lines of code • More heavily used & tested so less bugs • Better productivity in programming to richer platform • A full DBMS at client both practical & useful
Client-Side Database Issues • “Honey I shrunk the database” (SIGMOD99): • DB Footprint • Implementation Language • Both issues either largely irrelevant or soon to be: • Resource availability trends support standard infrastructure S/W • Dominant costs: admin, operations & user training, and programming • Vertical slice of standard apps rather than full custom infrastructure
DB Implementation Language • Special DB implementation language (Java) argument: • centers on auto-installation of S/W infrastructure • Auto-install is absolutely vital, but independent of implementation language • Auto-install not enough: client should be a cache of recently used S/W and data • Full DBMS at client • Client-side cache of recently accessed data • Optimizer selected access path choice: • driven by accuracy & currency requirements • balanced against connectivity state & communications costs
Admin Costs Still Dominate • 60’s large system mentality still prevails: • Optimizing precious machine resources is false economy • Admin & education costs more important • TCO education from the PC world repeated • Each app requires admin and user training…much cheaper to roll out 1 infrastructure across multiple form factors • Sony PlayStation has 3Mb RAM & Flash • Nokia 9000IL phone has 8Mb RAM • Trending towards 64M palmtop in 2001 • Vertical app slice resource reqmt can be met
Dev Costs Over Memory Costs • Specialty RTOS weak dev environments • Quality & quantity of apps driven by: • Dev environment quality • Availability of trained programmers • Requirement for custom client development & configuration greatly reduces deployment speed • Same apps have wide range of device form factors • Symmetric client/server execution environ. • DB components and data treated uniformly • Both replicated to client as needed
Client Side Summary • On order of billions connected client devices • Most are non-conventional computing devices • All devices include standard DB components • Standard physical & logical device interconnect standards will emerge • DB implementation language irrelevant • Device DB resource consumption much less important than ease of: • Installation • Administration • Programming • Symmetric client/server execution environment
Agenda • Client Tier • Middle Tier • High Availability via redundant data & metadata • Fault Isolation domains • XML • Mid-tier Caching • Server Tier • Summary
Server Availability: Heisenbugs • Industry good at finding functional errors • Multi-user & application interactions hard: • Sequences of statistically unlikely events • Heisenbugs (http://research.microsoft.com/~gray/talks) • Testing for these is exponentially expensive • Server stack is nearing 100 MLOC • Long testing and beta cycles delay software release • System size & complexity growth inevitable: • Re-try operation (Microsoft Exchange) • Re-run operation against redundant data copy (Tandem) • Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data
The Inktomi Lesson • Inktomi web search engine (Brewer --SIGMOD’98) • Quickly evolving software: • Memory leaks, race conditions, etc. considered normal • Don’t attempt to test & beta until quality high • System availability of paramount importance • Individual node availability unimportant • Shared nothing cluster • Exploit ability to fail individual nodes: • Automatic reboots avoid memory leaks • Automatic restart of failed nodes • Fail fast: fail & restart when redundant checks fail • Replace failed hardware weekly (mostly disks) • Dark machine room • No panic midnight calls to admins • Mask failures rather than futile attempt to avoid
Apply to High Value TP Data? • Inktomi model: • Scales to 100’s of nodes • S/W evolves quickly • Low testing costs and no-beta requirement • Exploits ability to lose individual node without impacting system availability • Ability to temporarily lose some data W/O significantly impacting query quality • Can’t loose data availability in most TP systems • Redundant data allows node loss w/o data availability lost • Inktomi model with redundant data & metadata a potential solution
Redundant Data & Metadata • TP Point access to data nearly solved problem • TP systems scale with user number, people on planet, or business size • All trending at sub-Moore rates • Data analysis systems growing far faster than Moore’s Law: • Greg’s law: 2x every 9 to 12 (SIGMOD98—Patterson) • Seriously super-Moore implying that no single system can scale sufficiently: clusters are the only solution • Storage trending to free with access speed limiting factor • Detailed data distribution statistics need to be maintained • Improve access speed & availability using redundant data (indexes, materialized views, etc.) • Async update for stats, indexes, mat views • Data paths choice based upon need currency & accuracy
Affordable Availability • Web-enabled direct access model driving high availability requirements: • recent high profile failures at eTrade and Charles Schwab • Web model enabling competition in information access • Drives much faster server side software innovation which negatively impacts quality • “Dark machine room” approach requires auto-admin and data redundancy • Inktomi model (Erik Brewer–SIGMOD’98) • 42% of system failures admin error (Gray) • Paging admin at 2am to fix problem is dangerous
Connection Model/Architecture • Redundant data & metadata • Shared nothing • Single system image • Symmetric server nodes • Any client connects to any server • All nodes SAN-connected Client Server Node Server Cloud
Compilation & Execution Model Client • Query execution on many subthreads synchronized by root thread Server Thread Lex analyze Parse Normalize Optimize Code generate Server Cloud Query execute
Lose node: • Recompile • Re-execute Node Loss/Rejoin • Execution in progress Client • Rejoin: • Node local recovery • Rejoin cluster • Recover global data at rejoining node • Rejoin cluster Server Cloud
Redundant Data Update Model • Updates are standard parallel query plans • Optimizer manages redundant access paths • Query plan responsible for access plan management: • No significant new technology • Similar to materialized view & index updates today Client Server Cloud
Fault Isolation Domains • Trade single-node perf for redundant data checks: • Complex error recovery more likely to be wrong than original forward processing code • Many redundant checks are compiled out of “retail versions” when shipped • Fail fast rather than attempting to repair: • Bring down node for mem-based data structure faults • Don’t patch inconsistent data … copies keep system available • If anything goes wrong “fire” the node and continue: • Attempt node restart • Auto-reinstall O/S, DB and recreate DB partition • Mark node “dead” for later replacement
Data Structure Matters • Most internet content is unstructured text • restricted to simple Boolean search techniques • Docs have structure, just not explicit • Yahoo hand categorizes content • indexing limited & human involvement doesn’t scale well • XML is a good mix of simplicity, flexibility, & potential richness • Structure description language of internet • DBMSs need to support as first class datatype • Too few librarians in world • so all information must be self-describing
Relational to XML • SELECT … FOR XML • FOR XML RAW (return an XML rowset) • FOR XML AUTO (exploit RI, name matching, etc.) • FOR XML EXPLICIT (maximal control) • Annotated Schema • Mapping between XML and relational schema expressed in XML • Templates • Encapsulated parameterized query • XSL/T support • XPATH support • Direct URL access (SQL “owned” virtual root) • SELECT … FOR XML • Annotated schema • Templates
XML to Relational • XML bulk load • Templates and Annotated Schema • SQL server hosted XML tree • Directly insert document into SQL Server hosted XML tree • Select from server hosted XML tree rowset & insert into SQL tables • XML Data type support • Hierarchical full text search
XML: Example http://SRV1/nwind?sql=SELECT+DISTINCT+ContactTitle+FROM+Customers+WHERE+ContactTitle+LIKE+'Sa%25'+ORDER+bY+ContactTitle+FOR+XML+AUTO Result set: <Customers ContactTitle="Sales Agent"/> <Customers ContactTitle="Sales Associate"/> <Customers ContactTitle="Sales Manager"/> <Customers ContactTitle="Sales Representative"/>
Mid-Tier Cache Requirements • Non-proprietary multi-lingual programming • Symmetric mid-tier & server programming model • Non-connected, stateless programming model • High scale thread pool based • Efficient main memory DB support • Full query over local cache • Query over just cached data, or • Query over full corpus (server interaction reqd) • Ability to handle network partitions & server failure • Support for life-time attributed data: • Transactional (possibly multi-server) • Near real time • Every N time units • Read only
Agenda • Client Tier • Middle Tier • Server Tier • Affordable computing by the slice • Everything online • Disk are actually getting slower • Processing moves to storage • Approximate answers quickly • Semi-structured storage support • Administrative issues • Summary
Server-Side Changes • Server databases more functionally rich than often required • Trend reversal: • Less at the server-tier with richer mid-tier • Focus at back-end shifts to: • Reliability, Availability, and Scalability • Reducing administrative costs • Server side trends: • Scalability over single-node performance • Everything online • Affordable availability in high scale systems
Compaq/Microsoft TPC-C Benchmark • tpmC These are Top 5 benchmarks as of Feb 17, 2000. 227,079 152,207 135,815 135,815 135,461 $98 $55 $53 $20 $19 ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $4,341,603. $ 19.12 tpmC ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $2,880,431. $ 18.93 tpmC IBM RS/6000 S80 AIX 4.3.3 Oracle v 8.1.6 $7,156,910. $52.70/tpmC Escala EPC2400 AIX 4.3.3 Oracle v8.1.6 $7,462,215 $54.94 tpmC Enterprise 6500 Solaris 2.6 Oracle 8i v 8.1.6 $13,153,324. $97.10/tpmC NOTE: All TPC-C results reported as of February 17, 2000
Computing by the Slice Source: TPC report executive summary
Just Save Everything • Able to store all Info produced on earth (Lesk): • Paper sources: less than 160 TB • Cinema: less than 166 TB • Images: 520,000 TB • Broadcasting: 80,000 TB • Sound: 60 TB • Telephony: 4,000,000 TB • These data yield 5,000 petabytes • Others estimate upwards of 12,000 petabytes • World wide 1998 storage production: 13,000 petabytes • No need to manage deletion of old data • Most data never accessed by a human • Access aggregations & analysis, not point fetch • More storage than data allows for greater redundancy: • indexes, materialized views, statistics, & other metadata
Disk are Becoming Black Holes • Seagate Cheetah 73 • Fast: 10k RPM, 5.6 ms access, 16 MB cache • But Very large: 73.4 GB • Result? Black hole: 2.4 accesses/sec/gb • Large data caches required • Employ redundant access paths
Processing Moves Towards Storage • Trends: • I/O bus bandwidth is bottleneck • Switched serial nets support very high bandwidth • Processor/memory interface is bottleneck • Growing CPU/DRAM perf gap leading to most CPU cycles in stalls • Combine CPU, serial network, memory, & disk in single package • E.g. David Patterson ISTORE project
Processing Moves Towards Storage • Each disk forms part of multi-thousand node cluster • Redundant data masks failure • RAID-like approach • Each cyberbrick commodity H/W and S/W • O/S, database, and other server software • Each “slice” plugged in & personality set • E.g. database or SAP app server) • No other configuration required • On failure of S/W or H/W, redundant nodes pick up workload • Replace failed components at leisure • Predictive failure models
Approximate Answers Quickly • DB systems focus on absolute correctness • As size grows, correct answer increasingly expensive • Text search systems depend upon quick approx answer • Approx answer with statistical confidence bound: • Steadily improve result until user satisfied • “Ripple Joins for Online Aggregation”(Hellerstein-SIGMOD99) • Allows rapid exploration of large search spaces: • Conventional full accuracy only when needed • Run query on incomplete mid-tier cache?