1 / 14

PROJECT

PROJECT. Topics. Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures Survey: Security and Privacy for Big Data: A Survey and Future Directions Experiments: Citizen Behavior of 7-21 Storm in Beijing, 2012 Music Knowledge Mining

katy
Télécharger la présentation

PROJECT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROJECT

  2. Topics • Theoretical: • Error Performance Analysis for Partitioned Sketch Data Structures • Survey: • Security and Privacy for Big Data: A Survey and Future Directions • Experiments: • Citizen Behavior of 7-21 Storm in Beijing, 2012 • Music Knowledge Mining • Hadoop for Video Streaming on the Web • MapReduce Jobs For Video Conversion • Your proposed one…

  3. 1. Error Performance Analysis for Partitioned Sketch Data Structures • We talked about the time complexity already (in terms of update time) • TASK: • What about error performance? • How to optimally allocate the depth of each sketch (zipfian)? • Start to learn from how CM sketch analyzes its error performance (Theorem 1 and alike) • http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf • Learn about P(d)-CU • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6574663

  4. How to determine this?

  5. Result • Analysis (e.g., mathematical derivations) • Some initial simulation (correctness)

  6. 2. Survey • Write a good survey in English on • Security and Privacy for Big Data: A Survey and Future Directions • Cite at least 40+ references (IEEEXplore and ACM Digital Lib) • Paper organization • Classify these works in different categories, from different angles • Extensive comparisons • Identify future directions (i.e., what are the missing pieces?)

  7. Some Materials • http://www-03.ibm.com/security/solution/intelligence-big-data/ • https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf • http://www.emc.com/collateral/industry-overview/big-data-fuels-intelligence-driven-security-io.pdf • http://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdf • http://www.trendmicro.com/cloud-content/us/pdfs/business/white-papers/wp_addressing-big-data-security-challenges.pdf • http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/ • Think about: • Storage • Analysis • Applications • Cloud, Internet-of-Things

  8. 3. Analyze Citizen Behaviors of 7-21 Storm in Beijing, 2012 • The Power of Social Networks and Public Crowd • http://v.youku.com/v_show/id_XNDM5NjY1Mzc2.html • Using social network APIs like Sina Weibo • open.weibo.com/wiki • Use the keyword search to retrieve all related data • #望京人赴机场免费救援# ,#双闪车队# (100+) • 菠菜X6,@望京网

  9. 4. Music Knowledge Mining • Million Song Dataset • http://labrosa.ee.columbia.edu/millionsong • For Example: to calculate music density • http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/ • YOUR TASK: Predict which songs a user will listen to • http://www.kaggle.com/c/msdchallenge

  10. 5. Video Streaming on the Web • Store your video as chunks in HDFS • Case: user suddenly move to a specific part of the video • Seek in the file to position the cursor at a specific location • HDFS can only be accessed through a Hadoop client, Apache server is not. • Apache/FUSE: all file system operations (dir browsing, file opening and content access) are enabled over HDFS content through the FUSE interface. • http://internetmemory.org/en/index.php/synapse/using_hadoop_for_video_streaming/

  11. Result • A demo • Choose a least 1 type of video format (e.g., flv) • A client to play video • A web server (with Apache FUSE) • HDFS to store your videos

  12. 6. MapReduce For Video Conversion • Convert huge number of video files from one format to another. • using the open source video converter FFMPEG (http://ffmpeg.org/download.html). • Data stored on HDFS • Create an app doing it (running on Google AppEngine)

  13. Mechanism • Working in group: 3-5 students, clear roles • Email me (ase_bit@yahoo.com) by this Friday (Nov 22) • Team leader, Team members • Topic • Deadline: 28 December 2013! • Deliverable: project report in Chinese • Introduction (motivation, WHY?) • Related Work (What others have done) • Your proposal (HOW?) • Performance Evaluation • Conclusion • Presentation

  14. Suggested Arrangement • Week-1: Define your roles and start literature research • Week-2 and 3: Propose solutions • Week-4 and 5: Implementation and obtain results • Week-6: Write report

More Related