400 likes | 505 Vues
In this insightful discussion, Michael Stonebraker, an adjunct professor at MIT, presents a critical look at the current state of IT and software development, highlighting the high failure rates of projects and the repetitive mistakes that plague the industry. He emphasizes the need for higher-level design environments, better software usability, and the importance of addressing meaningful industrial problems. Stonebraker offers practical recommendations and points out significant challenges faced by companies like Cisco and Akamai. This talk is a call to action for educators and practitioners to bridge the gap between academia and industry.
E N D
So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)
Where To Find Problems • State of affairs • Interesting industrial problems • Mike’s picks • My whine on XML • Grand challenges
State of Affairs • IT failure rate • Software half-life • No knobs
State of Affairs • ~50-75% of IT projects fail • if we built bridges, our profession would be fired • and the same mistakes are repeated over and over (excessive ambition, rolling specs, bad design, failure to load a large data set early)
What To Do? • We typically don’t teach this stuff • probably because we don’t (can’t) spend any time in industry to figure it out Action item: at the very least read a couple of Robert L. Glass’s books
State of Affairs • Hardware “half-life” is 18 months • Software half-life is 18 years (or more)!
What To Do? • Much higher level design environments • we are stuck at the general purpose programming level (conceivable benefit limited) • workflow and other higher level graphical notations probably a good idea
What To Do? • special purpose languages nice (why are report writers shunned?) • higher level versions of SQL and Xquery • See Informix Visionary for a cool example
State of Affairs • Commercial products are way too hard to use • takes people in white lab coats to get them up and keep them up • Full employment act for DBAs forever
What To Do? • “No knobs” • only buttons are “go” and “stop” • all tuning automatic • index selection is one of the minor ones (buffer pool size, partitioning, log buffer pool size, …) • Error reporting stinks
Interesting Industrial Problems Should Focus Research • BBC • OZ entertainment • Cisco • Akamai • Fidelity My suggestion: NSF should require a letter of support from a CIO with each grant proposal.
Interesting Problems -- BBC • Digitize 50 years of British television creativity • want to serve it up on demand • especially British soccer games • media is wearing out • Random access to 1 Petabyte (or so) • By the unwashed internet 200 million
CNN Variation • On-line digital news editing by 300 news directors • who want to find Monica Lewinsky • and 30 seconds of footage on suffering in Bosnia
What To Do? • Content outlives support for the content format • Automatic content indexing • cannot afford a librarian • Global scale distributed system • Staging and caching • high locality of reference
What To Do? • Query model meets visualization systems • unwashed will not learn Xquery • Rights management • incredibly sticky issue in whole area
Interesting Problem - OZ Entertainment • New theme park near Kansas City • “no lines” • no lost kids • virtual theme park as teaser
What To Do? • Large scale GIS • update intensive! • Large scale triggering problem • alert me if there is a cancellation at X and I am within 300 yards
Interesting Problem - Cisco Systems • Supply chain of 60K suppliers for custom goods • Want to query the transitive closure of this supply chain • can I make 10 more routers next week?
What To Do? • Huge federated system • central metadata a non-starter • no single DBA • global query optimizer a non-starter • Adapters for 1M (or so) legacy systems • how to write them semi-automatically?
Interesting Problem - Akamai • Billing is 95/5 • 5 minute intervals • pay for bandwidth of 95th percentile • 300 Gbytes a day (compressed) of click stream data Biggest warehouses on the planet will soon be click stream data!
Click Stream Data • Customers want to mine their click stream • And Akamai only has a portion of it • i.e. huge distributed data base • Query is “tell me something interesting” • i.e. why are 95% of the shopping carts abandoned? • and not a pile of statistics
Interesting Problem - Fidelity • Financial portal for high net worth individuals • must connect to several hundred Fidelity systems • Customers want to know fairly complex things • i.e. rank my money manager against all value managers for 1, 3 and 5 years
What to Do? • Voice to NL to structured data • voice to NL works in focused verticals (weather, airline schedules) • but this is a pretty broad app • NL to structured data requires some work • put in the joins • look up vocabulary in the DBMS
What to Do? • How to join unstructured data to structured data • tell me the news stories about all stocks which have increased in value more than 10% today
Mike’s Picks • Too much middleware • Akamai for structured data
Interesting Problem - Middleware • Average enterprise has • one (or more) app servers • one (or more) EAI packages • one (or more) ETL packages • one (or more) portal products • one (or more) application packages • and maybe someday a federated DBMS
All of these systems • Contain transformation engines • And often do function activation (app service) • And often have adapters to legacy systems Huge overlap in functionality!!
What to Do? • Consolidate weaker paradigms under stronger ones • e.g. federated DBMS subsumes ETL • OR DBMS subsumes app service Middleware becomes DBMS-centric!
Interesting Problem - Caching • Akamai et. al cache HTML • closer to the browser that wants it • Would be nice to cache structured data • need to cache application that uses the data • and the data
What to Do? • Materialized views are a predefined solution • Nice to have a more dynamic one • Cache (query, answer) pairs?
History Lesson (Codd) • Putting semantics into data order is bad • restricts storage options • hidden meaning bad • Hierarchical representations for data are bad • rewrite the queries when representation changes (data independence) • Complexity is bad
My Spin on XML (XMLSchema) • As a storage format, XML is good for documents not data • Codd’s thinking has not been repealed (order, hierarchy, complexity) • no binary format • in line tags are inefficient • SGML run amok….
My Spin on XML • As an “on the wire” notation, XML is ok for data • but don’t try to move too much stuff • and don’t try to move it too fast • Remember why client-server put in binary movement!
Xquery For Data • Won’t store data in XML • Necessary to design something that is easy to translate into SQL • Alternate syntax for OR SQL • which is much cleaner (// is a user defined function in Informix)
XML Summary • Focus attention on XMLSchema as a document description system not a data description system • Focus Xquery on documents not data W3C use cases do not do this!
OR DBMS • XML is merely this year’s data type • Next year it will be WML or ... • OR is still not finished • query optimization • data base design • physical storage layout
Grand Challenge #1 • Preponderance of web accessible data is structured • much more than “facts and figures” • Construct a system to access “the rest of” the web
What To Do • GUI problem (NL or Vis) • Query notation problem • Discovery problem • how do you “scrape” a structured data web site to figure out the meaning of its data? • Federation problem
Grand Challenge #2 • Everything of material importance is geo-positioned (lojacked) • Construct the mother of all GIS systems • complete automation of supply chains • “where is my wife” (or the closest restroom)
What To Do • Most of the issues in GC #1 • The mother of all triggering problems • The mother of all security/privacy problems