1 / 24

Combining Data Sets with SQL

Combining Data Sets with SQL. Advanced SAS Programming. Joins. SQL joins refer, in general, to a horizontal combination of tables: i.e.: some form of record matching. This makes SQL joins similar to merging in the data step. A very simple join…. Simple Join. proc sql ;

yasir-hill
Télécharger la présentation

Combining Data Sets with SQL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Data Sets with SQL Advanced SAS Programming

  2. Joins • SQL joins refer, in general, to a horizontal combination of tables: i.e.: some form of record matching. • This makes SQL joins similar to merging in the data step. • A very simple join…

  3. Simple Join procsql; createtable join1 as selectgoals.*, performance.* from advsas.goals, advsas.performance ; quit; The * operator selects all columns Lists tables for selection

  4. Log • What is the nature of the warning that we see? • Merge expects (requires) variables that we join on to have the same name, SQL has no such requirement.

  5. Consider… procsql; createtable join2 as selectgoals.*, performance.month as month1, performance.sales from advsas.goals, advsas.performance ; quit; • Produces no warning, and 4 variables in the resulting data set. • Should also give some additional notion as to what process SQL undertook in joining these tables.

  6. Cartesian Product • The default joining of multiple data sets listed in the from statement is a full Cartesian product—every record in one table is paired with every other. • Most of the time we want a subset of that product…

  7. More Specific Joins • One way to subset the full product is with a where statement. procsql; createtable join3 as selectgoals.*, performance.sales from advsas.goals, advsas.performance where goals.month eq performance.month ; quit; Select no longer chooses month from performance Subset to those that match on month

  8. Final Product • Result is similar to a data step merge on month…

  9. Inner Join • This is also known as an inner join, which takes rows that match on the specified condition. procsql; createtable join4 as selectgoals.*, performance.sales from advsas.goals join advsas.performance on goals.month eq performance.month ; quit; Join is used as an operator (inner is default) On replaces where as the conditioning clause

  10. Inner Joins vs. Data Step Merge • Inner joins and data step merges are not identical. Consider… procsql; createtable join as selectcrewlist.*, sched.flightnum from advsas.crewlist, advsas.sched where crewlist.empid eq sched.empid ; quit; versus data merge; merge advsas.crewlist advsas.sched; by empid; run;

  11. Results • Inner join takes matches present across all tables, merge preserves all records. • We can get this behavior with merge by using the in= variable

  12. Outer Joins • Outer joins permit records lacking matches into the final table. procsql; createtable joinfull as selectcrewlist.*, sched.flightnum from advsas.crewlist full join advsas.sched on crewlist.empid eq sched.empid ; quit; • Creates the same result as the previous merge.

  13. Left and Right Join • The left join preserves all rows from the first table listed. • Hopefully the right join is self-explanatory… • Consider the following joins and their results.

  14. procsql; createtable joinleft as selectcrewlist.*, sched.flightnum from advsas.crewlist left join advsas.sched on crewlist.empid eq sched.empid ; quit; procsql; createtable joinright as selectcrewlist.*, sched.flightnum from advsas.crewlist right join advsas.sched on crewlist.empid eq sched.empid ; quit; Left & Right Joins Left table had row with no match Right table did not

  15. One-to-Many • SQL behavior is not affected by one-to-many matches. • Consider a merge of the accounts and transaction tables against the possible SQL joins.

  16. One to Many Merge procsortdata=advsas.accounts out=acct; by account_no; run; procsortdata=advsas.transactions out=trans; by account_no; run; data one_many_merge; merge trans acct; by account_no; run;

  17. Inner Join procsql; createtable one_many_inner as selecttransactions.*, fname, lname from advsas.transactions, advsas.accounts where transactions.account_no eq accounts.account_no ; quit;

  18. Full Join procsql; createtable one_many_full as selecttransactions.*, fname, lname from advsas.transactions full join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;

  19. Left Join (transactions on left) procsql; createtable one_many_left as selecttransactions.*, fname, lname from advsas.transactions left join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;

  20. Right Join (transactions on left) procsql; createtable one_many_right as selecttransactions.*, fname, lname from advsas.transactions right join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;

  21. Many-to-Many • The data step does not handle many-to-many merges in what would generally be considered a “useful” manner. • Consider the following merge: data many_to_many; merge advsas.salesstaff advsas.clients; by district_no; run;

  22. Result of many-to-many data step merge (check notes in log). When multiple records have same by value in both data sets, matches one-to-one then one-to-many (see PowerPoint on merge for processing details)

  23. SQL • The SQL join: procsql; createtable many2many as selectsalesstaff.*, company from advsas.salesstaff, advsas.clients where salesstaff.district_no eq clients.district_no ; quit;

  24. Result Cartesian product is formed for all records with a matching key value Is the result different for any of the outer joins?

More Related