1 / 15

Intel® Distribution for Apache Hadoop *

Intel® Distribution for Apache Hadoop *. Ram Lakshminarayan Asia Pac – BDM Datacenter .

enid
Télécharger la présentation

Intel® Distribution for Apache Hadoop *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter

  2. From the dawn of civilization until 2003, we humans created 5 Exabyte of information. Now we create that same amount of information in two days!In 2012, the digital universe of data will expand to 2.72 zettabytes (ZB). Then it’s predicted to double every two years.

  3. What is Big Data? Datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze* volume, variety, value and velocity Unstructured *”Big data: The next frontier for innovation, competition, and productivity”, McKinsey Global Institute Intelligent Transportation System (Shanghai) Unstructured(multi-structured) data • Logs/records: 9TB/day • Image: 900TB/day • Video: 3PB/day • Near realtime image/video processing needed • Near realtime queries required • Deep, complex analysis for traffic prediction, criminal detection, … Velocity: near-realtime processing Volume: massive scale & growth Value: predictive analytics Variety: many different forms Volume Structured (relational) data Time

  4. Big Data usage across industries Government Healthcare Education National, Public and Cyber Security Telecommunication Manufacturing Financial Services Retail

  5. Big Data opportunity, a vertical industry view Source: Gartner

  6. Hadoop Introduction Source: http://blog.spec-india.com • Hadoop is: • A flexible, extensible open source framework • Hadoop includes: • Storage (HDFS) • No SQL database (Hbase) • Distributed compute (Map Reduce) • Plus more utilities Source: http://www.bodhtree.com

  7. What is in it for us? Big Data Building Blocks Compute Network Storage Software & Technologies Responsive Intel® Xeon® Product Family E3-E5-E7 Intel® Atom™ Intel® Xeon PhiTM Intel® Ethernet Controllers Intel®Ethernet Adapters Intel® Ethernet Switch Silicon Intel®True Scale Fabric Intelligent Storage1 Scale-out Storage1 Scale-up Storage1 Intel® SSD 710 series, DC S3700 (SATA) Intel® SSD 910 series (PCIe) Intel®Distribution for Apache Hadoop Intel® Data Center Manager Intel® Node Manager Intel®Expressway Service Gateway Intel®Cache Acceleration Software Intel’s Lustre Intel® VT and Intel® TXT Intel®AES-NI EnergyEfficient Secure HighAvailability Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics Choice Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors

  8. Intel’s Role in Big Data Accelerating big data analytics through faster and more effective CPU, Storage, I/O, Network platform. Driving innovation in big data applications by providing optimized software stack and services. Foster the growth of big data ecosystem through broad collaboration with partners. Investing in Solution Research and Services for Big Data

  9. Intel®Distribution for Apache HadoopWhat did we launch…? Intel Supported Distribution Subscription • Focus on near real-time analytics w/ HBase & Hive enhancements • Access control, encryption, secure data movement • Job throughput efficiency for HDFS • Dynamic replication for HDFS & HBase • Intel optimized total solution architecture -distro, storage, network, compute Intel ® Manager for Hadoop* Software Deployment, Configuration, Monitoring, Alerting and Security Sqoop Data Exchange Zookeeper* Coordination Hbase* Columnar Storage Pig* Scripting R-connector Hive* SQL-Like Query Mahout* Data Mining Oozie* Workflow MapReduce Distributed Processing Framework Flume Log Collector HDFS* Hadoop Distributed File System 5X Performance for Real-time jobs Open Source Optimized Intel IA/Distro • HBase as the data store. Query all CDR in month • Inserting 10000 records/second/server • Read from disk: >400 query/second/server

  10. Intel®Manager for Apache Hadoop • Quick cluster/node deployment • Tab navigate between components Node Node Node • Single pane config for MapReduce fair or capacity scheduling • Tuning controls for HBasedata • Guided wizards, tasks, workflows Compatible with Intel or Other Popular Distributions

  11. Driving The Key Pillars for Big Data Cloud Enablement Access Control List at cell level NETWORK Security API AuthN Data Movement File based encryption MapReduce Jobs STORAGE COMPUTE Providing cross-stack optimizations using Hadoop as lead vehicle and open source as adoption driver HDFS Cross Data Center Replication Distributed Tables Across Data Centers Management Archival for cold data on HDFS Snapshots Caching & Non-volatile Memory Throughput Flash Storage Performance Hot file replication OS Kernel caching Infiniband AES-NI Encryption Intel IA Architecture SSE Instruction Sets Ensuring Scale-out architectures work best on Intel platforms

  12. Intel® Xeon® E5-2690processor Intel Platform Benefits for Big Data Deploy IntelDistribution for Apache Hadoop* Intel® 10GbE Adapters Intel® SSD 520 Series TeraSort for 1TB Data - > 4 Hours to 7 Minutes >4 Hours ~50%improved ~80%improved Intel® Xeon 5690 7200 HDD 1GbE Adapters ~7 mins ~50%improved ~40%improved Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel.com/performance `

  13. Government - Smart Traffic Intelligent Transport SystemHadoop for Predictive Analytics Regional Data Collection • Crime prevention, Info sharing, • Predictive Traffic Analytics • Machine Generated Data: • Embedded HBase client in camera for real-time inserts of structured/unstructured data • 30000 + camera data collection points • 2 billion HBase records • Petabytes of traffic data • Terabytes of images • 1 week of Data mining • Results: • Automated queries for traffic violation • Crime Prevention: ID fake • licenses <1 minute • Traffic Routing App Servers Distributed Processing Across District Nodes Derived Analytics Services 13 Crime Prevention Citizen Traffic Services

  14. Telco- China Mobile Group GuangdongHadoop & Xeon optimized Big Data storage & analytics Challenge: Deliver real time access to Call Data Records (CDR) for billing self service • Solution: Chose Hadoop + Xeon over RDMS to remove data access bottlenecks, increase storage, and scale system • Benefits: Lower TCO, 30x performance increase, stable operation, analytics on subscriber usage for targeted promotions • Data Characteristics: • 30TB billing data/month • Real-time retrieval of 30 days CDRs • 300k records/second, 800k insert speed/sec • 15 analytics queries • 133 server nodes Analytics

More Related