100 likes | 247 Vues
This exploration delves into pressing challenges in information processing, including availability, security, and weakly structured data management. Despite advances since earlier studies, the sector sees little progress in addressing software and administrative downtimes, which continue to dominate. The growing complexity of security systems complicates management, while unstructured data storage remains inadequate. Additionally, multi-tiered applications are often handcrafted, lacking industry-standard solutions. Research into self-healing systems and enhanced data integration is essential for future advancements.
E N D
Key Challenges in Information Processing James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.03.01
Unsolved Challenges • Availability shows only incremental progress • Security broken & too hard to manage • Weakly structured data poorly supported or exploited • Writing Multi-tiered apps too hard • Data intensive mid-tiers need more DB help • Scalability over perf & big-iron
Availability:Largely unsolved problem • 1985 Tandem study (Gray): • Administration: 42% downtime • Software: 25% downtime • Hardware 18% downtime • 1990 Tandem Study (Gray): • Software 62% • Administration: 15% • Most studies have admin contribution much higher • Observations: • H/W downtime contribution trending to zero • Software & admin costs dominate & growing • We’re still looking at 10 to 15 year-old research
Availability:Cost in dollars/hour • Brokerage operations $6,450,000 • Credit card authorization $2,600,000 • Ebay (1 outage 22 hours) $225,000 • Amazon.com $180,000 • Package shipping services $150,000 • Home shopping channel $113,000 • Catalog sales center $90,000 • Airline reservation center $89,000 • Cellular service activation $41,000 • On-line network fees $25,000 • ATM service fees $14,000 From Dave Patterson Talk at HPTS 2001 -- Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”... survey done by Contingency Planning Research."
Availability: Admin stillthe problem • Administrators expensive • Admin dominate H/W & S/W costs (5x or more) • Administrators make mistakes • Admin #1 or #2 cause of downtime • Big problem yet little research focus: • Still few data points available: • Most systems houses won’t publish ... need research • No benchmarks: • Benchmarks drive industry & systems research • Goal: Server appliance model: • Auto-tuning, pluggable server-side resources • IBM SMART, Microsoft index tuning wizard, etc. • Dave Patterson, Aaron Brown, Armando Fox, ... • More help needed
Availability:the S/W is broken • Even server-side software is BIG: • Windows2000: over 50 mloc • DB: 1.5+ mloc • SAP: 37 mloc (4,200 S/W engineers) • Tester to Developer ratios above 1:1 • Quality per unit line only incrementally improving • Current massive testing investment not solving problem • New approach needed: • Assume S/W failure inevitable • Redundant, self-healing systems right approach • Tandem process-pair work good but getting fairly old ... progress?
Security:Securing systems too hard • “Less than 0.0025% of corp revenue invested in security” – Richard Clarke, Special security advisor to president • Data loss, intentional data & systems corruption • Clearly under-reported problem • S/W Vulnerabilities rampant: • Buffer overruns, stack smashing, code insertion, SQL insertion, elevation of privs, ... • Programmers being more careful doesn’t solve problem • Most systems miss-configured: • Security systems too complex & hard to admin • Research needed: Autonomous threat detection • better tools to detect, correct, & prevent S/W security vulnerabilities • Monitor all measurable system metrics: • Detecting new threats & miss-configurations • Track execution profiles: detect changes: drive alerts, auto-config, reports to vendor, upgrade s/w,...
Unstructured Data:Mostly not stored in DB • All data has some schema but not always fully known nor affordable to pre-declare: • Most data in unstructured stores with text search • DB community is losing • Much research work on XML focused upon: • Mapping XML to relational scheamas • leverages existing relational IQ but not as flexible • New, non-relational (native XML) stores • Storing natively doesn’t leverage DB investment • Mostly mid-tier data integration servers • Research potential: • Native stores leveraging existing infrastructure esp. cost-based optimizers, storage engines, & utilities • IR work progressing but little integration into DB • Integrating IR work into DB W/O required schema, ability to exploit if there, ability to discover/infer if not
Multi-tiered apps:we’re not helping • Many high scale multi-tiered apps still hand crafted • Needed: Object access layer, data cache, queuing, query compiler & optimizer, data directed routing, security, ... • Problem not adequately solved by industry • Integration with server-tier DB advantages: • ACID relaxation driven by attributes on apps or data • Relaxed models with auto-cache population & mgmt • Query parsing for data directed routing • Want to parse once & accept same lang as backend • Exploit optimizer: model full mid-tier to back-end costs • Where to run joins, functions, aggs, etc. • Need security integration W/O fully provisioning backend • Data intensive mid-tiers are a DB & TP problem: • Solve with DB tech & integrate with backend DB • Componentized DB for mid-tier use one approach
Scalability: perf not the problem • Focus still on performance rather than scalability: • Clusters only “nearly” work • Must buy biggest iron & get most from it • Research goal: Server appliances • Gray’s servers by the brick • brick includes disk, memory, & CPU resources • Only admin actions required: • Add brick to, or defect from, cluster • Data redundancy (potentially) on geo-scale: • adapts to access patterns & available bandwidth • If zero-admin clusters actually worked & scaled: • performance would be a secondary issue • The admin problem would nearly go away • The S/W quality problem greatly simplified • Hiesenbugs solved via retry and redundancy • Would shift investment dollars from H/W & admin to S/W (where it belongs )