BIG DATA MBUS 626-01 - G4 Zoe Mayhook Bailee Neyland Crystal Side Michael Stuber
Big Data Finding that Diamond in the Rough • Most common interpretation of big data is the systematic analysis of huge volumes of data to find patterns and behaviors that are not readily apparent. • Has rapidly created an entire sub-industry that generated $11.59 billion in 2012, according to the research community Wikibon. • By 2017, they predict the big data market will be worth $47 billion.
Big Data Defined “A massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional databases and software techniques.” 3-V Model
Big Data Continued… • What makes data big? • Origin • Growth
Major Sources of Big Data • Social Media • Server Logs • Web/clickstream • Machine/sensor • Geolocation
Important factors to consider • Big data needs to be mediated by the human touch and common sense • Human beings and human-oriented decisions must play a fundamental role in any big data strategy or companies risk alienating their customers and damaging their brands.
Who is using Big Data? LEADERS: • Amazon • Uses big data to drive innovation through data, with scalable services for data collection, storage, integration, analytics and collaboration • Handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers • Walmart • Handles more than one million customer transactions every hour • Uses big data to reach customers, or friends of customers who have mentioned something online to inform them about that exact product and include a discount • Netflix • Uses big data to more-accurately predict the consumer behaviors of their subscribers and potential subscribers
Who else is using Big Data? And who should? Start ups Healthcare
Big Data Companies • Cloudera- leader in Apache Hadoop-based software and services and offers a powerful new data platform that enables enterprises and organizations to look at all their data and ask bigger questions for unprecedented insight at the speed of thought. • MapR - delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. • Splunk- founded to pursue a disruptive new vision: make machine data accessible, usable and valuable to everyone. By monitoring and analyzing everything from customer clickstreams and transactions to network activity and call records - Splunk turns machine data into valuable insights no matter the business. • Palantir- Delivers big data technology to improve crisis response
Up-and-Coming Big Data Companies
Big Data Technologies Wide-scale digitization of information has created many new sources of data Traditional approaches to managing data don’t support volume, velocity, and variety New approaches are needed: • NoSQL Databases • MapReduce & Hadoop
NoSQL • Not Only Structure Query Language • Data can be unstructured • Data is typically organized in key-value pairs • Values can be anything from images, songs, and documents, to lists or traditional data types • Examples include Cassandra & Redis
MapReduce & Hadoop • All processing is done on key/value pairs • Basic approach is to organize very large sets of data (map) and then crunch them (reduce) • Many algorithms can be implemented within MapReduce architecture • Hadoop & MapReduce systems provide task management & file systems to distribute jobs across hundreds (or thousands) of commodity servers
Mini Case Discussion The San Leandro California Police Department uses mounted squad car camera’s to routinely photograph license plates while patrolling the area. Millions of these pictures are passed on to the Northern California Regional Intelligence Center, and are analyzed using big data software developed by Palantir. What are the benefits to photographing, saving, and analyzing license plate information? 2. What do you find most concerning?
Ethics for Big Data • Ethically Neutral • Might not align with how we feel, but should align with core values • Ethical inquiry should take place due to the sheer volume, variety and velocity of big data
Framework for Big Data Ethics • Identity • Relationship between offline and online identity • Privacy • Who should control access to data? • Ownership • Who owns data, can rights be transferred? • Reputation • Can we determine what data is trustworthy?
Alignment of Methodology • Inquiry • Discussion of core values • Analysis • Review current practices, and assess how well they align with core values • Articulation • Explicit, written expression of alignment and misalignment between values and practices • Action • Tactical plan to close alignment gaps
Ethical Guidelines - Proposals • Radical Transparencies • explain what data is being collected and how it will be used • Simplicity by Design • Allow users to adjust any privacy settings to determine what they want shared or now • Privacy policies should be simple and understandable • Preparation and Security • Define what information and data you need, and what information you can do without • Develop crisis strategy if company system gets hacked • Make Privacy Part of the DNA • Hire a chief privacy officer or chief data officer • Address privacy in all levels of the organization
Benefits of Adopting Big Data Ethics • Reduction in risk of unintended consequences • Faster consumer adoption (reducing fear of unknown) • Increased pace of innovation • Reduced friction from legislation
Big data is about “building new analytic applications based on new types of data, in order to better serve your customers and drive a better competitive advantage.” …Thank you