1 / 52

computing in the clouds

niveditha
Télécharger la présentation

computing in the clouds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Computing in the Clouds Aaron Weiss

    3. Cloud Computing The next big thing But what is it? Web-based applications (thin client) Utility computing – grid that changes rates for processing time Distributed or parallel computing designed to scale for efficiency Also called: On-demand computing, software as a service, Internet as platform

    4. Data Centers Decades ago – computing power in mainframes in computer rooms Personal computers changed that Now, in network data centers with centralized computing are back in vogue But: no longer a hub-and-spoke Although Google famous for innovating web searching, Google’s architecture as much a revolution Instead of few expensive servers, use many cheap servers (1/2M servers in ~ 12 locations)

    5. With thin, wide network Derive more from scale of the whole than any one part – no hub Cloud – robust and self-healing Uses too much power Cheaper power solutions we’ve talked about earlier in class Heavy utilization of virtualization Single server multiple OS instances, minimize CPU idle time

    6. CloudOS (VMWare and Cisco) Instead of each server running own copy of OS (Current Google model) Should have single OS treats everything in data center as another resource Network channels to coordinate events Cloud more cohesive entity

    7. Entire user interface resides in single window Provide all facilities of OS inside a browser Program must continue running even as number of users grows Communication model is many-to-many

    8. To move applications to cloud must master multiple languages and operating environments In many cloud applications, back-end process relies on relational DB so part of code in SQL Client side in JavaScript or embedded within HTML documents Server application in between written in scripting language

    9. Distributed Computing Speed of cloud depends on delegation Break up into subtasks Retrieving results of search DB query – parse results, construct result sets, formal results, etc. If tasks small enough, simultaneous Dependencies? Complex Distributed computing not new SETI, Folding Hadoop – Apache Foundation No need for creating specialized custom software Distributes petabytes of data projects, 1000s nodes

    10. A Utility Grid In past, pay for cost of cycles used Today most organizations create own data centers But cost to run Use 99% of capacity only 10% of time In Web service, lots of hosting providers Typically do not replicate distributed computing

    11. Amazon, Google, etc. should scale up data centers, create business models to support third party use Amazon EC2 fee based public use 10/07 Customers create virtual image of SW environment Create instance of machine in Amazon’s cloud Appears to user as dedicated server Customers choose configuration Customers can create/destroy at will If surge in visitors, additional instances on demand If slows down, terminate extra instances Charges $0.10 per instance hour based on compute units regardless of underlying hardware Data cost $0.10 to $0.18 per Gig

    12. Google and IBM similar cloud utility model to CS education Provide CS students access to distributed computing environment In future businesses will not need to invest in a data center

    13. Software as Service Move all processing power to the cloud and carry ultralight input device Already happening? E-mail on Internet, then Web Google Docs Implications for Microsoft, software as purchasable local application Windows Live (Microsoft’s cloud) Adobe web based photoshop

    14. Cloud Paradigm shift and disruptive force Google and Apple will pair Lightweight mobile device by Apple tapping into Google’s cloud But Failed thin clients of past Larry Ellison in 90s trouble create cost-effective thin clients Difficult to produce powerless thin client at low enough cost Yet, Non-thin-clients can fail, SW needs care

    15. Networks will need to be robust In U.S. broadband quality poor Broadband advances slow, bottleneck for clouds Privacy ??? What if 3rd party has your data and government subpoena’s them? Do you even know? Can you lose access to your info if you don’t pay bill? Vendor lock-in – need certain client to access cloud operator Not open like the Internet today

    16. Partly Cloudy New name, same familiar computing models? New because integrates models of centralized computing, utility computing, distributed computing and software as service Power shifts from processing unit to network Processors commodities Network connects all

    17. Cloud computing leaving relational databases behing Joab Jackson, 9/08 Government Computer News

    18. “One thing you won’t find underlying a cloud initiative is a relational database. And this is no accident: Relations databases are ill-suited for use within cloud computing environments” Geir Magnusson, VP 10Gen, on-demand platform service provider

    19. DBs specifically designed to work in cloud computing Google – BigTable Amazon – SimpleDB 10Gen – Mongo AppJet – AppJetDB Oracle open-source - Berkely DB MySQL for Web - Drizzle

    20. Characteristics of Cloud DBs Run in distributes environments None are transactions in nature Sacrifice advanced querying capability for faster performance Queried using object calls instead of SQL

    21. Very Large relational like Oracle implemented in data centers DB material spread across different locations Executing complex queries over vast locations can slow response time Difficult to design and maintain an architecture to replicate data Instead: Data targeted in a clustered fashion

    22. The Claremont Report on Database Research SIGMOD 2008

    23. What is it? May, 2008 prominent DB researchers, architects, users, pundits met in Berkeley, CA at Claremont Resort Seventh meeting in 20 years Report based on discussion of new directions in DBs

    24. Turning point in DB Research New opportunities for technical advances, impact on society, etc. 1. Big Data not only traditional enterprises, but also e-science, digital entertainment, natural language processing, social network analysis Design new custom data management solutions from simpler components

    25. 2. Data analysis as profit center Barriers between IT dept. and business units dropping Data is the business Data capture, integration, etc. keys to efficiency and profit BI vendors - $10B (only front-end) Also need better analytics, sophisticated analysis non-technical decision makers want data

    26. 3. Ubiquity of structured and unstructured data Structured data – extracted from text, SW logs, sensors and deep web crawl Semi-structured – blogs, Web 2.0 communities, instant messaging Publish and curate structured data Develop techniques to extract useful data, enable deeper explorations, connect datasets

    27. 4. Expanded developer demands Adoption of relational DBMS and query languages has grown MySQL, PostegreSQL, Ruby on Rails Less interest in SQL, view DBMS as too much to learn relative to other open source components Need new programming models for Data management

    28. 5. Architectural Shifts in computing Computing substrates for DM are shifting Macro: Rise of cloud computing Democratizes access to parallel clusters Micro: shift from increasing chip clock speed to increase number of cores, threads Changes in memory hierarchy Power consumption New DM technologies

    29. Research Opportunities Impact of DB research has not evolved beyond traditional DBs Reformation Reform data centric ideas for new applications and architectures Synthesis Data integration, information extraction, data privacy Some topics not mentioned, because still part of significant effort Must continue with these efforts Also must continue with Uncertain data, data privacy and security, e-science, human-centric interactions, social networks, etc.

    30. DB Engines Big market relational DBs well known limitations Peak performance: OLTP with lots of small, concurrent transactions debit/credit workloads OLAP with few real-mostly, large join, aggregation Bad for: Text indexing, server web pages, media delivery

    31. DB engine technology could be useful in sciences and Web 2.0 applications, but not in current bundled DB systems Petabytes of storage and 1000s processors, but current DB cannot scale Need schema evolution, versioning, etc Currently, many DB engine startup companies

    32. 1. Broaden range for multi-purpose DBs 2. Design special purpose DBs Topics in DB engine area: Systems for clusters of many processors Exploit remote RAM and Flash as persistent Query opt. and data layout continuous Compress and encrypt data integrated with data layout and optimization Embrace non-relational DB models Trade off consistency/availability for performance Design power aware dBMS

    33. Declarative programming for emerging platforms Programmer productivity is important Non-expert must be able to write robust code Data Centric programming techniques Map reduce – language and data parallelism Declarative languages – Data log Enterprise application programming – Ruby Rails, LINQ

    34. New challenges – programming across multiple machines Data independence valuable, no assumptions about where data stored XQuery for declarative programming? Also need language design, efficient compilers, optimize code across parallel processors and vertical distribution of tiers Need more expressive languages Attractive syntax, development tools, etc Data management – not only storage service, but programming paradigm

    35. Interplay of Structured and Unstructured Data Data behind forms – Deep Web Data items in HTML Data in Web 2.0 services (photo, video sites) Transition from traditional DBs to managing structured, semi-structured and unstructured data in enterprises and on the web Challenge of managing dataspaces

    36. On the web Vertical search engines Domain independent technology for crawling Within the enterprise Discover relationships between structured and unstructured data

    37. Extract structure and meaning from un- and semi-structured data Information extraction technology – pull entities and relationships from unstructured text Need: apply and management predictions from independent extractors Algorithms to determine correctness of extraction Join with IR and ML communities

    38. Better DB technology needed to manage data in context Discover implicit relationships, maintain context through storage and computation Query and derive insight from heterogeneous data Answer keyword queries over heterogeneous data sources Analysis to extract semantics Cannot assume have semantic mappings or domain is known

    39. Develop algorithms to provide best-effort services on loosely integrated data Pay as you go as semantic relationships discovered Develop index structures to support querying hybrid data New notions of correctness and consistency

    40. Innovate on creating data collections Ad-hoc communities to collaborate Schema will be dynamic Consensus to guide users Need visualization tools to create data that are easy to use Result of tools may be easier to extract info

    41. Cloud Data Services Infrastructures providing software and computing facilities as a service Efficient for applications Limit up-front capitol expenses reduce cost of ownership over time Services hosted in a data center Shared commodity hardware for computation and storage

    42. Cloud services available today Application services (salesforce.com) Storage services (Amazon S3) Compute services (Google App Enginer, Amazon EC2) Data services (Amazon SimpleDB, SQL Server Data Services, Google’s Datastore)

    43. Cloud data services offer API more restricted than traditional DBs Minimalist query languages, limited consistency More predictable services Difficult if had to provide full-function SQL data service Managability important in cloud environments Limited human intervention High workloads Variety of shared infrastructures

    44. No DBA or system admin Automatically by platform Large variations in workloads Economical to user more resources for short bursts Service tuning depends upon virtualization HW virtual machines as programming interface (EC2) Multi-tenant hosting many independent schemas in single managed DBMS (salesforce.com)

    45. Need for manageability Adaptive online techniques New architectures and APIs Depart from SQL and transations semantics when can SQL DBs cannot scale to thousands of nodes Different transactional implementation techniques or different storage semantics?

    46. Query processing and optimization Cannot exhaust search plan if 1000s sites More work needed to understand scaling realities Data security and privacy No longer physical boundaries of machines or networks

    47. New scenarios Specialized services with pre-loaded data sets (stock prices, weather) Combine data from private and public domains Reaching across clouds (scientific grids) Federated cloud architectures

    48. Mobile applications and virtual worlds Manage massive amounts of diverse user-created data, synthesize intelligently and provide real-time services Mobile space Large user bases Emergence of mobile search and social networks Timely information to users depending on locations, preference, social circles, extraneous factor and context in which operate Synthesize user input and behavior to determine location and intent

    49. Virtual worlds – Second Life Began as simulations for multiple users Blur distinction with real-world Co-space, for both virtual and physical worlds Events in physical captured by sensors, materialized in virtual Events in virtual can affect physical Need to process heterogeneous data streams Balance privacy against sharing person RT info Virtual actors requires large-scale parallel programs Efficient storage, data processing, power sensitive

    50. Moving Forward DB research community doubles in size last decade Increasing technical scope make it difficult to keep track of field Review load for papers growing Quality of reviews decreasing over time Need more technical books, blogs, wikis Open source software development in DB Competition: system components for cloud computing Large-scale information extraction

More Related