90 likes | 103 Vues
The top tool is R, with Python being one of the fastest growing tools. Companies with data analytical needs to use R, Python, and SAS. R and Python will depend on a number of factors such as nature of their business and budget. Enroll for Free Demo.<br>
E N D
http://aspireit.net/ SAS Authorized Training Partner
Big Data Challenges and Analysis-Driven Data Become a SAS® Certified Data Scientist Get your data science certification, and make yourself stand out – whether you're looking to change jobs, get a promotion or sharpen your current skills The data science certification program comprises the focus areas of both the SAS Certified Big Data Professional Machine learning and predictive modeling techniques. Critical SAS programming skills. Accessing, transforming and manipulating data. How distributed and in-memory big data sets. to apply these techniques to Improving data quality for reporting and analytics. Pattern detection. Experimentation in business. Fundamentals of statistics and analytics. Optimization techniques. Working with Hadoop, Hive, Pig and SAS. Time series forecasting. Exploring and visualizing data. Essential communication skills. http://aspireit.net/ SAS Authorized Training Partner
You need to manage big data and perform advanced analytics. The SAS Certified Data Science Professional program includes all five learning modules 1) Big Data Preparation, Statistics and Visual Exploration 2) Big Data Programming and Loading 3) Predictive Modeling 4) Advanced Predictive Modeling 5) Text Analytics, Time Series, Experimentation and Optimization Big Data Preparation, Statistics and Visual Exploration A)Reading external data files: The way you write your DATA step depends on where your data are stored. Your raw data may be in one of two locations: part of the job stream stored in an external file. An external file is managed by your operating system and not by SAS. SAS can read and write many types of external files. Your DATA step code manages the processing of these external files. If the program above was submitted under Windows and the raw data were stored in the file c:\readdata\runnersapril.dat, the DATA step to read the external file could be written as follows. The INFILE statement identifies the external file containing the raw data. Example 1.2 Reading Data Lines from an External File data runners; infile 'c:\readdata\runnersapril.dat'; input name $ age runtime1 runtime2 runtime3 runtime4; run; http://aspireit.net/ SAS Authorized Training Partner
B) Storing and processing data. I. Stored Process Report A SAS stored process report is a cached version of a stored process. You define a stored process report by pointing it at a stored process. When you run the stored process report it will return the results for this stored process. If the stored process has been run within a certain timeframe then those results will come back. If the stored process was run too long ago then it will be run again to update the results, and then those results will come back. You also define an expiration policy for the stored process which defines how long the stored process results are kept. If the results have expired, then the next time the report is accessed it will be run again to freshen up the results. http://aspireit.net/ SAS Authorized Training Partner
WRITING OUTPUT DIRECTLY TO BROWSER II. If you create a stored process using the defaults in the wizard from Enterprise Guide, which is the most common way to do so, then it will add in the stpbegin and stpend macro calls. This setup ODS for whatever context you are using the stored process in. However, you can turn these off and take full control of what is returned when the stored process is called. You turn them off by unchecking the “Stored process macros” item in the SAS® Code part of the wizard. You can return content to the caller of the stored process by writing to the fileref _webout. When calling the stored process using the stored process web application, writing to _webout will write content directly into the web browser. This is how we can write custom HTML from a stored process. The following code in a stored process will write some HTML to the web browser which creates a menu, allowing the user to make a selection and then call another stored process. The proc sql creates some HTML tags that will have the values for product_line so that they can be selected from a list. That data step writes to _webout, which will write the lines to the http://aspireit.net/ SAS Authorized Training Partner
browser. The HTML is a simple FORM that lets the user select a value for product_line and then runs a stored process passing that value to it. Here is the menu displayed. Here are the results after it has run. C)Combining Hadoop and SAS WHAT IS HADOOP? Hadoop is an open-source Apache project that was developed to solve the big data problem. How do you know you have a big data problem? In simple terms, when you have exceeded the capacity of conventional database systems, you’re dealing with big http://aspireit.net/ SAS Authorized Training Partner
data. Companies like Google, Yahoo, and Facebook were faced with big data challenges early on. The Hadoop project resulted from their efforts to manage their data volumes. It is designed to run on a large number of machines that don’t share any memory or disks. Hadoop allows for handling hardware failures through its massively parallel architecture. Data is spread across the cluster of machines using HDFS—Hadoop Distributed File System. Data is processed using Map Reduce, a Java-based programming model for data processing on Hadoop. WHAT CAN SAS DO IN HADOOP? SAS can process all your data on Hadoop. SAS can process your data feeds and format your data in a meaningful way so that you can do analytics. You can use SAS to query and use the SAS DATA step against your Hive and Impala data. This paper takes you through the steps of getting started with Hadoop. If you already have experience with the basics of Map Reduce and Pig, you can jump to the methods more centric to SAS of processing your data using the SAS language. SAS FILENAME Statement for Hadoop options set=SAS_HADOOP_CONFIG_PATH="\\sashq\cdh45p1"; options set=SAS_HADOOP_JAR_PATH="\\sashq\cdh45"; FILENAME hdp1 hadoop 'test.txt'; /* Write file to HDFS */ data _null_; file hdp1; put ' Test Test Test'; run; /* Read file from HDFS */ data test; infile hdp1; input textline $15.; run; Using Base SAS 9.4 with Hadoop: http://aspireit.net/ SAS Authorized Training Partner
Hadoop Procedure: How To Submit HDFS Commands? filename cfg 'C:\Hadoop_cfg\cdh57.xml'; /* Copy war_and_peace.txt to HDFS. */ /* Copy moby_dick.txt to HDFS. */ proc hadoop options=cfg username="sasxjb" verbose; HDFS mkdir='/user/sasxjb/Books'; HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt" OUT='/user/sasxjb/Books/moby_dick.txt'; HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt" OUT='/user/sasxjb/Books/war_and_peace.txt'; run; http://aspireit.net/ SAS Authorized Training Partner
How Do I Submit MapReduce Jobs? filename cfg 'C:\Hadoop_cfg\cdh57.xml'; proc hadoop options=cfg user="sasxjb" verbose; mapreduce input='/user/sasxjb/Books/moby_dick.txt' output='/user/sasxjb/outBook' jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3-96.jar' outputkey="org.apache.hadoop.io.Text" outputvalue="org.apache.hadoop.io.IntWritable" reduce="org.apache.hadoop.examples.WordCount$IntSumReducer" combine="org.apache.hadoop.examples.WordCount$IntSumReducer" map="org.apache.hadoop.examples.WordCount$TokenizerMapper"; run; How Do I Submit Pig Latin Programs? filename cfg 'C:\Hadoop_cfg\cdh57.xml'; proc hadoop options=cfg username="sasxjb“ verbose; pig code=pigcode; Run; http://aspireit.net/ SAS Authorized Training Partner