1 / 21

Nathan Kohn BU MET enzyme@bu

Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom). Stanislav Seltser BU MET sseltser@bu.edu. Nathan Kohn BU MET enzyme@bu.edu. 6 Billion Flickr Photos. 900 Million Facebook Users. 72 Hours a Minute YouTube.

cynara
Télécharger la présentation

Nathan Kohn BU MET enzyme@bu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom) Stanislav SeltserBU METsseltser@bu.edu Nathan KohnBU METenzyme@bu.edu

  2. 6 Billion Flickr Photos 900 Million Facebook Users 72 Hours a Minute YouTube 28 Million Wikipedia Pages “…growing at 50 percent a year…” “… data a new class of economic asset, like currency or gold.” Big Data is Everywhere 2

  3. GPUs Multicore Clusters Clouds Supercomputers Big Learning BU Undergraduates BU Faculty BU Graduatestudents How will wedesign and implementBig learning systems? 3

  4. Collaborative Filtering Social Network Netflix User Movie Probabilistic Analysis Text Analysis Wiki Docs Words Graphs are Everywhere 4

  5. Big Data & Linear Regression

  6. Stochastic Gradient Descent

  7. Serial vs Parallel SGD

  8. Big Data Landscape –Apps, Infrastructure, Data Semantics

  9. Landscape

  10. Grad Student Response #1 How Big is Big? How is BigData measured?As per my understanding, the term big data doesn’t refer directly to the size of the data itself. What the term might mean is that the demand of data (storage/transfer/analysis) has surpassed several parameters that the relational databases cannot control (or handle) –too big to handle--. How is it measure, I really don’t know. Server storage keeps increasing and increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be keeping up with it, but then again I don’t know exactly what measure is being used. Is Big Data relevant to you professionally?Indeed it is, even though I am not using it or practicing it daily. I am really interested in learning it. Is Big Data relevant to you personally?Very relevant, and it is a topic that drove me into pursuing a master’s degree

  11. Grad Student Response #2 How Big is Big? How is BigData measured?Big data is a term for large data sets that are too complex to compute by traditional data management processes and tools. Its points and data types are dependent and measured by the parameters set forth by each organization. Where does BigData come from? Big data can come from various sources that can be categorized as internal or external contributors. What is BigData good for? BigData is good for complex and large data sets that exist within a relational databases and may require object-oriented programming. Would you like to see Big Data incorporated in your courses? Yes, I think that we exist in a period in which we are inundated by social media, numbers, photographs and other forms of data which require us to be well versed in the storage, maintenance, and interface design so that we are better able to parse through the Big Data that we encounter on a daily basis.

  12. Undergrad Student #1 Is Big Data relevant to you personally? Yes. As my current major is Business Application Development, I can see myself gaining a lot of opportunities to deal with not only the technologies of building up user interface in the future but also the technologies of storing user information, and the techniques used to understand those data could be another opportunity for the business Would you like to see Big Data incorporated in your courses? Yes. I would like to see our course includes some of the techniques that the corporates use nowadays to understand the relation between their data and the problems they need to address, such as how they decide which part of the their big data provides them with the most helpful information for their problem, and explain the meaning of their data analysis based on the result, such as how they can decide the result is accurate and meaningful enough to allow them to take an action. Do you have any questions about Big Data? Big data is a pretty interesting and useful topic. It will be nice to have more background information to help our understanding.

  13. Undergrad Student #2 How Big is Big? How is BigData measured?The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper languages) data on magnitudes that were impossible before. Instead of just a phone book type of data, people can gather every relevant or even possibly relevant piece of information about anything (often but not limited to customers of a business). I have read articles about how some companies (credit card mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves. Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and deviations from those habits. While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes aren't being entered by hand). Classes or lectures on big data should come away with some practical knowledge on the subject, otherwise we're just applying a name to something people generally understand: organizations collect and analyze as much data as they can, and recent technology has made that amount of data staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is generally more important.

  14. Student Response #4 How Big is Big? How is BigData measured?Big data is a term developed recently to describe the trend of exponentially increasing amount of data stored by organizations for business uses. Very often these big data might be extremely big, such as 16 petabytes. These data is measured by the memory space they occupy. Thus, a 16 petabytes of big data approximately occupies 1015 bytes of memory. Where does BigData come from? Big Data could come from different sources, such as emails, social-networking sites, sensors on the webs, sensors installed on other tracking devices, or line of business applications. Is Big Data relevant to you professionally? Yes. In my previous work as market researcher, we always needed to gather information and analyzed them for the business decision making. The technologies of gathering big data and the techniques used to analyze and filter data is also considered extremely helpful for the career.

  15. Data Warehouse Course Student Comments: Very informative, content-rich course, covers the latest technologies, trends, and skills of data warehousing and data management, and data analysis. I would recommend to include this course in the required courses for the MS in CIS with concentration in Database Management and BI Program. Relevance to job opportunities and cutting edge technologies. This is probably the most useful course I have taken at Boston University. I have used every bit of what this professor taught every night at work. I have made contribution to my employer, a data mining company in ways that had never been done before as a result of this course. I have for the first time in my 8 years career planned, designed, and augmented a Data Warehouse from scratch. I have configured an analysis server and reported using MD x queries. This professor has been helpful in many ways. He has guided me through some Data Warehouse design projects at work. Moreover, he has been available to work with me and others after class and on week days.

  16. Road map

  17. A Archeology to help archaeologists find answers to questions hidden in thousands of images and text files generated from field sites around the world: Professor Mark Eramian et al. have been awarded $548,000 through the Digging into Data Challenge, National Endowment for the Humanities

  18. B Biology Recently, a researcher wanted to ascertain whether a search against GQ-Pat could provide novel insight into his work related to a specific gene, the cAMP Responsive Element Modulator. Reporting to the VP of R&D: Apply data mining and machine learning techniques to develop better search and content discovery in the field of patents Invent new ways to index tens of millions of documents with semantic information

  19. Z Zymurgy (hint: beer) QUIZ ?

  20. Quiz:

  21. Stanislav SeltserBU METsseltser@bu.edu Nathan KohnBU METenzyme@bu.edu

More Related