1 / 47

Data Science Tutorial | What is Data Science? | Data Science For Beginners | Edureka

** Data Science Certification using R: https://www.edureka.co/data-science ** <br>In this PPT on Data Science Tutorial, youu2019ll get an in-depth understanding of Data Science and youu2019ll also learn how it is used in the real world to solve data-driven problems. Itu2019ll cover the following topics in this session: <br>Need for Data Science <br>Walmart Use case <br>What is Data Science? <br>Who is a Data Scientist? <br>Data Science u2013 Skill set <br>Data Science Job roles <br>Data Life cycle <br>Introduction to Machine Learning <br>K- Means Use case <br>K- Means Algorithm <br>Hands-On <br>Data Science certification <br><br>Blog Series: http://bit.ly/data-science-blogs <br><br>Data Science Training Playlist: http://bit.ly/data-science-playlist <br><br>Follow us to never miss an update in the future. <br><br>Instagram: https://www.instagram.com/edureka_learning/ <br>Facebook: https://www.facebook.com/edurekaIN/ <br>Twitter: https://twitter.com/edurekain <br>LinkedIn: https://www.linkedin.com/company/edureka

EdurekaIN
Télécharger la présentation

Data Science Tutorial | What is Data Science? | Data Science For Beginners | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda 1. Need for Data Science 7. Data Life Cycle 2. Walmart Use Case 8. Introduction to Machine Learning 3. What is Data Science? 9. K – Means Use Case 4. Who is a Data Scientist? 10. K – Means Algorithm 5. Data Science – Skill Set 11. Hands - On 6. Data Science Job Roles 12. Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  2. Need For Data Science DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  3. Data Sources Evolution of Technology IOT Telephone Car Desktop Social Media Other factors Mobile Cloud Smart Car DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  4. Data Sources Evolution of Technology IOT Social Media Other factors DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  5. Data Sources Evolution of Technology 1,736,111 pictures 347,222 tweets 204,000,000 emails IOT Social Media Other factors 300 hours of video uploaded 4,166,667 likes & 200,000 photos 200,000 photos 4,166,667 likes & DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  6. Data Sources Evolution of Technology IOT Social Media Other factors DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  7. Walmart Use Case DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  8. Data Analysis At Walmart Halloween and cookie sales Data scientist at Walmart found a connection between Halloween and the sales of cookies. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  9. Data Analysis At Walmart Hurricane and strawberry pop tarts Data scientist at Walmart found that sales of Strawberry pop-tarts increased by 7 times before a Hurricane. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  10. Data Analysis At Walmart Social media and cake pops Walmart is leveraging social media data to find about the trending products so that they can be introduced to the Walmart stores across the world DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  11. What Is Data Science? DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  12. What is Data Science? “Torture the data, and it will confess to anything.” ~ Ronald Coase, Economics, Nobel Prize Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods: Programming + Statistics + Business DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  13. Who Is A Data Scientist? DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  14. Who Is A Data Scientist? Mathematics Business Technology DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  15. Data Science – Skill Set Data extraction & processing Programming languages Data wrangling & exploration Statistics Big Data processing frameworks Data visualisation Machine Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  16. Data Science Job Roles DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  17. Data Science Job Roles Data Scientist Data Analyst Data Architect Data Engineer Database Administrator Data & Analytics Manager Statistician Business Analyst DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  18. Data Science Life Cycle DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  19. Data Life Cycle Business requirements Data Deployment acquisition Data Science Data Modelling processing Data exploration DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  20. Data Life Cycle Business requirements Data acquisition Understand the problem Data Processing Identify central objectives Data exploration Identify variables that need to be predicted Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  21. Data Life Cycle Business requirements Data acquisition What data do I need for my project? What are the data sources? Data Processing How can I obtain the data? Data exploration What is the most efficient way to store and access all of it? Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  22. Data Life Cycle Business requirements Data acquisition Transform data into desired format Data Processing Data cleaning • Missing values • Corrupted data • Remove unnecessary data Data exploration Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  23. Data Life Cycle Business requirements Data acquisition understand the patterns in the data Data Processing Retrieve useful insight Data exploration form hypotheses Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  24. Data Life Cycle Business requirements Data acquisition Determine optimal data features for the machine-learning model Data Processing Create a model that predicts the target most accurately Data exploration Evaluate & test the efficiency of the model Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  25. Data Life Cycle Business requirements Data acquisition Check the deployment environment for dependency issues Data Processing Deploy the model in a pre- production/ test environment Data exploration Monitor the performance Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  26. Introduction To Machine Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  27. What Is Machine Learning? Machine learning is a subset of artificial intelligence (AI) which provides machines the ability to learn automatically & improve from experience without being explicitly programmed. Cherry Data They look the same! Apple Algorithm Orange DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  28. Types Of Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  29. K – Means Use Case DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  30. Brain Tumour Detection Using K - means K-Means clustering is an unsupervised learning algorithm used to partition a dataset into k clusters in which each data point belongs to the cluster with the nearest mean. Brain tumour segmentation deals with the implementation of the k-means algorithm for detection of range and shape of tumour in brain MR images. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  31. K – Means Algorithm DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  32. K – Means Algorithm ➢Randomly initialize k points called the cluster centroids. Here, k = 2 Initialization ➢Value of k(number of clusters) can be determined by the elbow curve. Cluster assignment Move centroid Optimization Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  33. K – Means Algorithm ➢Compute the distance between the data points and the cluster centroid initialized. Initialization ➢Depending upon the minimum distance, data points are divided into two groups. Cluster assignment Move centroid 1 Optimization 2 Convergence Cluster centroid Euclidean distance DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  34. K – Means Algorithm ➢Compute mean of red dots & reposition red cluster centroid to this mean Initialization ➢Compute mean of green dots & reposition green cluster centroid to this mean. Cluster assignment Move centroid Optimization 2 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  35. K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  36. K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  37. K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  38. K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  39. K – Means Algorithm ➢Finally, k-means clustering algorithm converges. Initialization ➢Divides the data points into two clusters clearly visible in red and green. Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  40. K – Means Algorithm ➢ Data Matrix ➢ Distance/ dissimilarity Matrix DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  41. Hands - On DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  42. Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  43. Edureka’s Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  44. Edureka’s Data Science Certification Data extraction, wrangling & exploration Introduction to Data Science Unsupervised Learning Classification techniques Introduction to Machine Learning Statistical Inference Recommender engine Deep Learning Time series Text Mining DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

  45. WebDriver vs. IDE vs. RC ➢ Data Warehouse is like a relational database designed for analytical needs. ➢ It functions on the basis of OLAP (Online Analytical Processing). ➢ It is a central location where consolidated data from multiple locations (databases) are stored. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science

More Related