370 likes | 491 Vues
Dive into the world of Big Data Analytics to explore its financial impact, structured vs. unstructured data, user insights, relevant technologies like Hadoop and MongoDB, coding examples, and the future of analytics. Discover the power of examining large data sets, uncovering patterns, and optimizing operational efficiency.
E N D
Big Data Analytics A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng
Agenda • Big Data Analytics and its Objectives • Financial Impact • Structured vs Unstructured Data • Users of Big Data • Relevant Technologies ( Hadoop, MongoDB) • Coding Examples • Future of Analytics
What is Big Data and why does it matter? • Defining Big Data Analytics • Examining large sets of data • Discovering patterns and trends • Data warehouses are insufficient • Purposes • Uncovering hidden needs of customers • Improve operational efficiency
Big Data & Operational Efficiency • “By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.” – IBM • Core Objectives • Gain • Analyze • Apply • Optimize
Financial Impact of Big Data • High cost of poor data quality • 3.1 trillion to US government annually • 10-25% of US business revenues • Opportunities for qualified analysts • Business Analyst: $66,000 • Data Analyst: $60,000 • Data Scientist: $113,000
Dimensions of Big Data • Essential Characteriestics: • Volume - Data quantity • Velocity - Data Speed • Variety - Data Types
Structured vs. Unstructured Data Structured Data • Represented as text • Transactional data, formal reports, accounting records of sales and costs • Relational databases / data warehouse • SQL Unstructured Data • May be textual or non-textual • Mobile usage, click stream activity, social media responses, genomic data • No structured database / data lake • NoSQL (Not only SQL), SQL Batch Queries
Illustrative Example Inventory Analyst Insurance Actuary
Interpretations Structured Data Big Data Analytics Big Data Analytics Structured Data
Users of Big Data • Device manufacturers, ERP providers, consulting firmscomprise 7 of top 10 users Big Data • Based on a survey conducted by Dell of large corporations in 2014… • 55% now follow Big Data strategy • 60% of Big Data projects involve a cloud • 32% involve real-time or near real-time processing • 22% use data lake • 20% of projects by outside consultants
Hadoop • Free, Java-Based programming framework • Distributes storage and processes large data sets • Started from a Google File System paper published in October 2003 • Development was furthered by Apache • Named after Doug Cutting’s son’s toy elephant (logo!)
When to Use (and Not Use) Hadoop YES! • Analytics • Search • Data Retention • Log File processing • Analysis of Text, Image, Audio, and Video Content • Recommendation systems like in E-Commerce Websites NO! • Low-latency or near real-time data access • Large number of small files to process • Multiple write scenarios requiring arbitrary writes between files
Hadoop Framework • Hadoop Common: Contains all the libraries and utilities • Hadoop Distributed File System (HDFS): Storage with high bandwith • Hadoop YARN: Resource-management platform • Hadoop MapReduce: Programming Model • for data processing
MongoDB = “The database for giant ideas” • Cross-platform document-oriented database • Open-source • “The database for giant ideas” • Founded in 2007 written to • handle specific problems with DoubleClick • Classified as NoSQL database
MongoDB Example Also, we can practice! http://www.w3resource.com/mongodb-exercises/#PracticeOnline