1 / 85

Data Analyst Interview Questions And Answers | Data Analyst Interview Questions | Simplilearn

Data analyst is one of the trending jobs of the 21st century. This video covers all the important questions that would help you crack a data analyst interview. It has a set of basics questions related to the data analytics filed. It also has a collection of beginner, intermediate and advanced level questions based on MS Excel, SQL, Tableau, and Python. It would enrich your theoretical and practical knowledge of data analytics. Let's get started.<br><br>Why become Data Analyst?<br>By 2020, the World Economic Forum forecasts that data analysts will be in demand due to increasing data collection and usage. Organizations view data analysis as one of the most crucial future specialties due to the value that can be derived from data. Data is more abundant and accessible than ever in todayu2019s business environment. In fact, 2.5 quintillion bytes of data are created each day. With an ever-increasing skill gap in data analytics, the value of data analysts is continuing to grow, creating a new job and career advancement opportunities. <br><br>The facts are that professionals who enter the Data Science field will have their pick of jobs and enjoy lucrative salaries. According to an IBM report, data and analytics jobs are predicted to increase by 15 percent to 2.72 million jobs by 2020, with the most significant demand for data analysts in finance, insurance, and information technology. Data analysts earn an average pay of $67,377 in 2019 according to Glassdoor.<br><br>Who should take up this course?<br>Aspiring professionals of any educational background with an analytical frame of mind are best suited to pursue the Data Analyst Masteru2019s Program, including:<br>1. IT professionals<br>2. Banking and finance professionals<br>3. Marketing managers<br>4. Sales professionals<br>5. Supply chain network managers<br>6. Beginners in the data analytics domain<br>7. Students in UG/ PG programs<br><br>ud83dudc49Learn more at: https://bit.ly/2SECA5r

Simplilearn
Télécharger la présentation

Data Analyst Interview Questions And Answers | Data Analyst Interview Questions | Simplilearn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s in it for you? Data Analytics Interview

  2. 1 What is the difference between Data Mining and Data Profiling?

  3. 2 Define the term Data Wrangling in data analytics? Data Wrangling is the process of cleaning, structuring and enriching the raw data into a desired usable format for better decision making Validate Clean Discover Structure Analyze Enrich

  4. What’s in it for you? Data Analytics Interview Click here to watch the video

  5. What are the common problems that data analysts encounter during analysis? 3 Handling duplicate and missing values Collecting the meaningful right data and the right time Making data secure and dealing with compliance issues Handling data purging and storage problems

  6. 4 What are the various steps involved in any analytics project? 1 Understand the problem 5 2 Interpret the results Data collection 4 3 Data exploration and analysis Data cleaning

  7. Which technical tools have you used for analysis and presentation purposes? 5 Being a data analyst, you are expected to have knowledge of the below tools for analysis and presentation purposes

  8. 6 What are the best practices for data cleaning? Make a data cleaning plan by understanding where the common errors take place and keep communications open Identify and remove duplicates before working with the data. This will lead to an effective data analysis process Focus on the accuracy of the data. Maintain the value types of data, provide mandatory constraints and set cross-field validation Standardise the data at the point of entry so that it is less chaotic and you will be able to ensure that all information is standardised, leading to fewer errors on entry

  9. 7 How can you handle missing values in a dataset? In listwise deletion method, an entire record is excluded from analysis if any single value is missing Listwise deletion It creates plausible values based on the correlations for the missing data and then averages the simulated datasets by incorporating random errors in your predictions Average imputation Use the average value of the responses from the other participants to fill in the missing value Regression substitution You can use multiple-regression analysis to estimate a missing value Multiple imputation

  10. 8 What do you understand by the term Normal Distribution? Normal Distribution is a type of continuous probability distribution that is symmetric about the mean and in a graph, normal distribution will appear as a bell curve • The mean, median and mode are equal • All of them are located at the centre of the distribution • 68% of the data lies within 1 standard deviation of the mean • 95% of the data falls within 2 standard deviations of the mean • 99.7% of the data lies within 3 standard deviations of the mean 34% 34% 0.15% 0.15% 13.5% 13.5% 2.5% 2.5% 0 -3 1 -2 2 -1 3

  11. 9 What is Time Series analysis? Time Series analysis is a statistical method that deals with ordered sequence of values of a variable at equallyspaced time intervals Time series data on Covid19 cases Time series graph

  12. 10 How is joining different from blending in Tableau? A B

  13. 11 How is overfitting different from underfitting?

  14. 12 What is the correct syntax for reshape() function in NumPy? array.reshape(shape) reshape(shape, array) reshape(array, shape) reshape(shape)

  15. 12 What is the correct syntax for reshape() function in NumPy? array.reshape(shape) reshape(shape, array) reshape(array, shape) reshape(shape) Example

  16. What is the difference between COUNT, COUNTA, COUNTBLANK and COUNTIF in Excel? 13 Sales Table COUNT function returns the count of numeric cells in a range Output

  17. What is the difference between COUNT, COUNTA, COUNTBLANK and COUNTIF in Excel? 13 Sales Table COUNTA function returns the count of non-blank cells in a range Output

  18. What is the difference between COUNT, COUNTA, COUNTBLANK and COUNTIF in Excel? 13 Sales Table COUNTBLANK function returns the count of blank cells in a range Output 3

  19. What is the difference between COUNT, COUNTA, COUNTBLANK and COUNTIF in Excel? 13 Sales Table COUNTIF function returns the count of values by checking a given condition Output

  20. 14 Explain how VLOOKUP works in Excel? VLOOKUP is used when you need to find things in a table or a range by row Syntax: VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) lookup_value - The value you want to look up table_array - The range where the lookup value is located col_index_num - The column number in the range that contains the return value range_lookup – Specify TRUE if you want an approximate match or FALSE if you want an exact match of the return value

  21. 14 Explain how VLOOKUP works in Excel? VLOOKUP is used when you need to find things in a table or a range by row Table from where you want to fetch the value of certain columns Find the age of Prince by looking up to his name Find the height of Angela by looking up to her name

  22. 15 How do you subset or filter data in SQL? To subset or filter data in SQL, we use WHERE and HAVING clause Movies Table Filter the table for movies that were directed by Brad Bird select * from Movies where Director = ‘Brad Bird’;

  23. 15 How do you subset or filter data in SQL? To subset or filter data in SQL, we use WHERE and HAVING clause Filter the table for directors whose movies have an average duration greater than 115 minutes Movies Table select Director, sum(Duration) as total_duration, avg(Duration) as avg_duration from Movies group by Director having avg(Duration)>115

  24. What is the difference between WHERE and HAVING clause in SQL? 16 Syntax: SELECT column1, column2, ...FROM table_nameWHERE condition; Syntax: SELECT column_name(s)FROM table_nameWHERE conditionGROUP BY column_name(s)HAVING conditionORDER BY column_name(s);

  25. 17 What is the correct syntax for reshape() function in NumPy? array.reshape(shape) reshape(shape, array) reshape(array, shape) reshape(shape)

  26. 17 What is the correct syntax for reshape() function in NumPy? array.reshape(shape) reshape(shape, array) reshape(array, shape) reshape(shape) Example

  27. 18 What are the different ways to create a dataframe in Pandas? 1 2 By initializing a dictionary By initializing a list

  28. Write the Python code to create an employees dataframe from the “emp.csv” file and display the head and summary of it. 19 To create a DataFrame in Python, you need to import the Pandas library and use the read_csvfunction to load the .csv file Example: Display the head of the dataset Summary of the dataset

  29. How will you select the Department and Age columns from an Employees dataframe? 20 To select Department and Age from the dataframe

  30. What is the criteria to say whether a developed data model is good or not? 21 • A good model should be intuitive, insightful and self-explanatory • The model developed should be able to easily consumed by the clients for actionable and profitable results • A good model should easily adapt to changes according to business requirements If the data gets updated, the model should be able to scale according to the new data

  31. 22 What is the significance of Exploratory data analysis? Exploratory data analysis is an important step in any data analysis process • Exploratory data analysis (EDA) helps to understand the data better • It helps you obtain confidence in your data to a point where you’re ready to engage a machine learning algorithm •  It allows you to refine your selection of feature variables that will be used later for model building • You can discover hidden trends and insights from the data

  32. 23 How do you treat outliers in a dataset? An outlier is a data point that is distant from other similar points. They may be due to variability in the measurement or may indicate experimental errors • Drop the outlier records • Cap your outliers data • Assign a new value • Try a new transformation

  33. 24 Explain descriptive, predictive, and prescriptive analytics. Predictive Descriptive Prescriptive Suggest various courses of action to answer “what should you do” Understands the future to answer “what could happen” Provides insights into the past to answer “what has happened” Uses optimization and simulation algorithms to advise possible outcomes Uses statistical models and forecasting techniques Uses data aggregation and data mining techniques Example: Predicts the sale of ice creams during summer, spring and rainy days Example: Lower prices to increases sale of ice creams, produce more/less quantities of a certain flavour of ice cream Example: An ice cream company can analyze how much ice cream was sold, which flavours were sold, and whether more or less ice cream was sold than the day before

  34. What are the different types of sampling techniques used by data analysts? 25 Sampling is a statistical method to select a subset of data from an entire dataset (population) to estimate the characteristics of the whole population Cluster sampling 3 Stratified sampling Systematic sampling 4 2 Judgmental or purposive sampling Simple random sampling 5 1

  35. 26 What are the different types of Hypothesis testing? Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses Hypothesis Testing Null hypothesis Alternative hypothesis It states that there is no relation between the predictor and outcome variables in the population. It is denoted by H0 It states that there is some relation between the predictor and outcome variables in the population. It is denoted by H1 Example: There is no association between patient’s BMI and diabetes Example: There could be an association between patient’s BMI and diabetes

  36. 27 Describe univariate, bivariate, and multivariate analysis. Univariate Analysis It is the simplest form of data analysis where the data being analysed contains only one variable Example – Studying the heights of players in NBA • Univariate analysis can be described using: • Central Tendency • Dispersion • Quartiles • Bar charts • Histograms • Pie charts • Frequency distribution tables

  37. 27 Describe univariate, bivariate, and multivariate analysis. Bivariate Analysis It involves analysis of two variables to find causes, relationships and correlations between the variables Example – Analysing sale of ice creams based on the temperature outside • Bivariate analysis can be explained using: • Correlation coefficients • Linear regression • Logistic regression • Scatter plots • Box plots

  38. 27 Describe univariate, bivariate, and multivariate analysis. Multivariate Analysis It involves analysis of three or more variables to understand the relationship of each variable with the other variables Example – Analysing Revenue based on expenditure • Multivariate analysis can be performed using: • Multiple regression • Factor analysis • Classification & regression trees • Cluster analysis • Principal component analysis • Clustering bar chart • Dual axis charts

  39. What function would you use the get current date and time in Excel? 28 In Excel, you can use the TODAY() and NOW() function to get the current date and time To get the current date To get the date and time

  40. Using the SUMIFS function in Excel, find the total quantity sold by sales representatives whose name start with A and cost of each item they have sold is greater than 10 29 Sales Table Output

  41. 30 Is the below query correct? If not, how will you rectify it? SELECT custid, YEAR(order_date) AS order_year FROM Order WHERE order_year >= 2016; SQL Query The above query is incorrect as we cannot use the alias name while filtering data using the WHERE clause solution SELECT custid, YEAR(order_date) AS order_year FROM Order WHERE YEAR(order_date) >= 2016;

  42. 31 How are Union, Intersect and Except used in SQL? The Union operator is used to combine the results of 2 or more SELECT statements Syntax: SELECT column_name(s) FROM table1UNIONSELECT column_name(s) FROM table2; Region 2 Region 1

  43. 31 How are Union, Intersect and Except used in SQL? The Intersect operator returns the common records that are the results of 2 or more SELECT statements Syntax: SELECT column_name(s) FROM table1INTERSECTSELECT column_name(s) FROM table2; Region 2 Region 1

  44. 31 How are Union, Intersect and Except used in SQL? The Except operator returns the uncommon records that are the results of 2 or more SELECT statements Syntax: SELECT column_name(s) FROM table1EXCEPTSELECT column_name(s) FROM table2; Region 2 Region 1

  45. Using the product_price table, write a SQL query to find the record with fourth highest market price 32 select top 4 * from product_price order by mkt_pricedesc select top 1 from the above result that is in ascending order of mkt_price

  46. From the product_price table, find the total and average market price for each currency where average market price is greater than 100 and currency is in INR or AUD 33 Product Price Table SQL Query Output

  47. This question will test your knowledge in Tableau, exploring the different features of Tableau and creating a suitable graph to solve a business problem 34

  48. Using the Sample Superstore dataset, create a view to analyse the sales, profits and quantity sold across different subcategories of items present under each category 34 Drag Category and Sub-category on Rows and Sales on to Columns Load the Sample – Superstore dataset It will result in a horizontal bar chart

  49. Using the Sample Superstore dataset, create a view to analyse the sales, profits and quantity sold across different subcategories of items present under each category 34 Drag Profit on to Colour and Quantity on to Label Sort the Sales axis in descending order of sum of sales within each sub-category

  50. Using the Sample Superstore dataset, create a view to analyse the sales, profits and quantity sold across different subcategories of items present under each category 34 Chairs under Furniture category had the highest sales and profit, while Tables had the lowest profit. For Office Supplies, sub-category Binders made the highest profit even though Storage had the highest sales. Under Technology category, Copiers made the highest profit though it has the least amount of sales`

More Related