1.44k likes | 1.46k Vues
Explore table lookup techniques with arrays, hash objects, and formats for efficient data matching. Learn how to use merges and joins for effective lookup values retrieval. Dive into one-dimensional arrays for streamlined table lookup tasks.
E N D
4 Chapter 4: Using Lookup Tables to Match Data: Arrays
4 Chapter 4: Using Lookup Tables to Match Data: Arrays
Objectives • Define table lookup. • List table lookup techniques.
Lookup Values • Data Values Table Lookups • Lookup values for a table lookup can be stored in the following: • array • hash object • format • data set • Lookup values for a table lookup can be stored in the following: • array • hash object • format • data set lookup • Lookup techniques include the following: • array subscript value • hash object key value • FORMAT statement, PUT function • merge, join • Lookup techniques include the following: • array subscript value • hash object key value • FORMAT statement, PUT function • merge, join
4.01 Multiple Choice Poll • Which of these is an example of a table lookup? • You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. • You want to send birthday cards to employees. The employees' names and addresses are in one data set and their birth dates are in another. • You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.
4.01 Multiple Choice Poll – Correct Answer • Which of these is an example of a table lookup? • You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. • You want to send birthday cards to employees. The employees' names and addresses are in one data set and their birth dates are in another. • You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.
Overview of Table Lookup Techniques • Arrays, hash objects, and formats provide an in-memory lookup table. • The merge and join use lookup values that are stored on disk.
1 2 3 4 Overview of Arrays • An array is similar to a row of buckets. ...
1 2 3 4 Overview of Arrays • An array is similar to a numbered row of buckets. • SAS puts a value in a bucket based on the bucket number. ...
1 2 3 4 Overview of Arrays • An array is similar to a numbered row of buckets. • SAS puts a value in a bucket based on the bucket number. • A value is retrieved from a bucket based on the bucket number.
Overview of a Hash Object • A hash object is similar to rows of buckets that are identified by the value of a key. Key Data Data ...
Overview of a Hash Object • A hash object is similar to rows of buckets that are identified by the value of a key. • SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. Key Data Data ...
Overview of a Hash Object • A hash object is similar to rows of buckets that are identified by the value of a key. • SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. Key Data Data • Value(s) are retrieved from the data bucket(s) based on the value(s) in the key bucket.
Data Value Label Overview of a Format • A format is similar to rows of buckets that are identified by the data value. ...
Data Value Label Overview of a Format • A format is similar to rows of buckets that are identified by the data value. • SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. ...
Data Value Label Overview of a Format • A format is similar to rows of buckets that are identified by the data value. • SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. • SAS uses a binary search on the data value bucket in order to return the value in the label bucket.
By Value By Value Data Data Data Data Overview of Merges and Joins • The DATA step MERGE and the SQL join operators are similar to multiple stacks of buckets that are referred to by the value of one or more common variables.
4.02 Multiple Answer Poll • Which techniques do you currently use when performing table lookups? • Arrays • Hash object • Formats • Merges • Joins
4 Chapter 4: Using Lookup Tables to Match Data: Arrays
Objectives • Define one-dimensional arrays. • Use a one-dimensional array for a table lookup task.
1 2 3 4 Overview of Arrays (Review) • An array is similar to a row of numbered buckets. • SAS puts a value in a bucket based on the bucket number. • A value is retrieved from a bucket based on the bucket number.
Defining Arrays (Review) An array is a temporary grouping of SAS variables that are arranged in a particular order and identified by an array name. The following tasks can be accomplished using an array: • performing repetitive calculations on a group of variables • creating many variables with the same attributes • restructuring data • performing a table lookup with one or more numeric factors An array exists only for the duration of the current DATA step.
Using One-Dimensional Arrays (Review) To use an array, declare the array by using an ARRAY statement. • ARRAYarray-name {number-of-elements} <$> <length> • <list-of-variables><(initial-values)>; General form for the one-dimensional ARRAY statement:
Using One-Dimensional Arrays (Review) • Examples of an ARRAY statement follow. array numarray{3} num1 – num3; array char{4} $ 6; array num{5} _temporary_ (5, 6, 7, 8, 9); array yr{2000:2002} yr2000 yr2001 yr2002; ...
Using One-Dimensional Arrays (Review) • Examples of an ARRAY statement follow. array numarray{3} num1 – num3; Array Name array char{4} $ 6; array num{5} _temporary_ (5, 6, 7, 8, 9); array yr{2000:2002} yr2000 yr2001 yr2002; ...
Using One-Dimensional Arrays (Review) • Examples of an ARRAY statement follow. array numarray{3} num1 – num3; number of elements array char{4} $ 6; array num{5} _temporary_ (5, 6, 7, 8, 9); array yr{2000:2002} yr2000 yr2001 yr2002; ...
Using One-Dimensional Arrays (Review) • Examples of an ARRAY statement follow. array numarray{3} num1 – num3; names three numeric variables array char{4} $ 6; creates four character variables, char1 – char4, each a length of 6 array num{5} _temporary_ (5, 6, 7, 8, 9); creates temporary numeric elements, and stores the numeric values 5, 6, 7, 8, 9 array yr{2000:2002} yr2000 yr2001 yr2002; names three numeric variables
4.03 Multiple Choice Poll • How many elements are referenced by the ARRAY statement: array numarray{*} num1 – num12;? • 0 • 1 • 12 • Unknown
4.03 Multiple Choice Poll – Correct Answer • How many elements are referenced by the ARRAY statement: array numarray{*} num1 – num12;? • 0 • 1 • 12 • Unknown
The DIM Function You can use the DIM function to return the number of elements in an array. For example, use the DIM function to provide the end value for a DO loop. • array numarray{*} num1 – num12; • <additionalstatements> • do i=1 to dim(numarray); • <additionalstatements> • end; Equivalent code: array numarray{12} num1 – num12; <additional statements> do i=1 to 12; <additional statements> end;
Business Scenario • The data set orion.employee_payrollcontains each employee’s hired date and current salary.
Business Scenario • The data set orion.salary_stats contains statistics for all Orion Star employees for the years 1974 through 2007. For example, the average salary of the employees hired in 1974 is currently $39,243.61.
Business Scenario • The two data sets must be combined to calculate the difference between the average salary and the actual current salary for each employee based on the year of hire. Partial Listing of compare Using One Dimensional Arrays Year_ Obs Employee_ID Hired Salary Average Salary_Dif 1 120101 2003 $163,040.00 $35,082.50 $127,957.50 2 120102 1989 $108,255.00 $88,588.75 $19,666.25 3 120103 1974 $87,975.00 $39,243.61 $48,731.39 4 120104 1981 $46,230.00 $36,436.67 $9,793.33 5 120105 1999 $27,110.00 $36,533.75 $-9,423.75 6 120106 1974 $26,960.00 $39,243.61 $-12,283.61 7 120107 1974 $30,475.00 $39,243.61 $-8,768.61 8 120108 2006 $27,660.00 $27,883.71 $-223.71
Setup for the Poll • The two data sets that need to be combined are as follows:
4.04 Poll • Can the two data sets be merged with the DATA step MERGE statement or joined with the SQL procedure? • Yes • No
4.04 Poll – Correct Answer • Can the two data sets be merged with the DATA step MERGE statement or joined with the SQL procedure? • Yes • No
4.05 Poll • What do the two data sets have in common? • They have the year in common. • They have nothing in common.
4.05 Poll – Correct Answer • What do the two data sets have in common? • They have the year in common. • They have nothing in common. In the data set, orion.salary_stats, the columns, except for the first, represent year. In the data set, orion.employee_payroll, year can be obtained from the Employee_Hire_Date variable.
Using a One-Dimensional Array data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; p304d01
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; Average=yr{2003}; ...
yr{1974} yr{1975} yr{1976} yr{1978} yr{1977} D D D D D yr{2003} yr{2004} yr{2007} D D D D D D Execution data compare; keep Employee_ID Year_Hired Salary Average Salary_Dif; format Salary Average Salary_Dif dollar12.2; array yr{1974:2007} Yr1974-Yr2007; if _n_=1 then set orion.salary_stats (where=(Statistic='Avg_Salary')); set orion.employee_payroll (keep=Employee_ID Employee_Hire_Date Salary); Year_Hired=year(Employee_Hire_Date); Average=yr{Year_Hired}; Salary_Dif=Salary-Average; run; ...