1 / 36

CC5212-1 Procesamiento Masivo de Datos Otoño 2016 Lecture 12: Conclusion

CC5212-1 Procesamiento Masivo de Datos Otoño 2016 Lecture 12: Conclusion. Aidan Hogan aidhog@gmail.com. FULL-CIRCLE. The value of data …. Twitter architecture. Google architecture. Generalise concepts to …. Working with large datasets. Value/danger of distribution. Frameworks.

dianecasey
Télécharger la présentation

CC5212-1 Procesamiento Masivo de Datos Otoño 2016 Lecture 12: Conclusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CC5212-1ProcesamientoMasivo de DatosOtoño2016Lecture 12: Conclusion Aidan Hogan aidhog@gmail.com

  2. FULL-CIRCLE

  3. The value of data …

  4. Twitter architecture

  5. Google architecture

  6. Generalise concepts to …

  7. Working with large datasets

  8. Value/danger of distribution

  9. Frameworks • For Distrib. Processing • For Distrib. Storage

  10. The Big Data Buzz-word

  11. Distributed Systems

  12. GFS (HDFS) / MapReduce (Hadoop)

  13. Information Retrieval

  14. NoSQL and Querying

  15. EXAM …

  16. Course Marking • 45% for Weekly Labs (~3% a lab!) [Easy] • 20% for Small Class Project [Medium] • 35% for Final Exam [Hard]

  17. Final Exam (35%) • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours • Not marking you on English • You can write in Spanish • Four questions (marked on best three) … • Old exams available on Material Docente (u-cursos) • There will be overlap, but not 100% overlap

  18. The following is not a legally abiding agreement. It is just a helpful guide for what’s important. -- My laywer, 2016

  19. Q1: Distributed Systems

  20. Q1: Distributed Systems (Slides) Slides: Lecture 2: Distributed Systems I Lecture 3: Distributed Systems II Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

  21. Q1: Distributed Systems (Topics) Possible Topics: • Advantages/disadvantages of a distributed system • Five distributed system design goals • Distributed architectures (P2P vs. C–S, Fat/Thin, n-Tier, etc.) • Java RMI (high-level) • Eight fallacies of distributed computing • Consensus basics (fail-stop vs. Byzantine, synchronous vs. asynchronous, goals) • Consensus protocols (2PC, 3PC, Paxos) • CAP theorem may appear in Q4, but will not appear in Q1

  22. GFS (HDFS) / MapReduce (Hadoop)

  23. Q2: GFS (HDFS) / MapReduce (Hadoop) Slides: Lecture 4: DFS & MapReduceI Lecture 6: MapReduce III: Pig (Lecture 5 was whiteboard only, practicing MapReduce design. This may also be useful when preparing!) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

  24. Q2: GFS (HDFS) / MapReduce (Hadoop) Possible Topics: • Google File System (reads, writes, fault-tolerance) • MapReduce(incl. design question) • HDFS/Hadoop (architecture) • Pig (high-level, e.g., explain what a given script does)

  25. Information Retrieval

  26. Q3: Information Retrieval (Slides) Slides: Lecture 7: Information RetrievalI Lecture 8: Information RetrievalII (Lecture 9 was whiteboard only, practicing PageRank. This may also be useful when preparing!) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

  27. Q3: Information Retrieval (Topics) Possible Topics: • Crawling (high-level multi-threading, (D)DoS, robots.txt, sitemap, distribution, bow-tie) • Inverted indexes (data structure, normalisation, Heap’s law, Zipf’s law, Elias encoding, etc.) • Ranking (relevance vs. importance, TF-IDF, Vector Space Model, etc.) • PageRank (concept, random surfer, calculation)

  28. Bring a Calculator! (that’s not a phone!)

  29. NoSQL and Querying

  30. Q4: NoSQL and Querying (Slides) Slides: Lecture 10: NoSQL I Lecture 11: NoSQL II Lecture 3: Distributed Systems II (slides on CAP) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

  31. Q4: NoSQL and Querying (Topics) Possible Topics: • CAP theorem (<- note out of order) • The Database Landscape • Key–Value stores (data model, operations, distribution, consistent hashing, replication, Dynamo, Merkle trees) • Document stores (high-level) • Tabular/column-families (data model, Bigtable, sorting, tablets, column families, SSTables, writes, reads, compactions, hierarchy, bloom filters) • Graph databases (high-level) • Cassandra (high-level)

  32. Final Exam (35%) • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours • Not marking you on English • You can write in Spanish • Four questions (marked on best three) … • Old exams available on Material Docente (u-cursos) • There will be overlap, but not 100% overlap

  33. SOME ADVERTISING

  34. Ayudantes wanted • This course next year • (I take ayudantes in this course once) • “Diplomado” version of this course • End of November / Start of December • Will prioritise based on course mark

  35. Next semester … La Web de Datos • Web of Data / Semantic Web / Linked Data

More Related