Distributed Query Conor Cunningham Principal Architect SQL Server Engineering Team
Who Am I? • I work on the SQL Server Core Engine • Specialize in Query Processing/Optimization • 14+ years at Microsoft • 3rd year speaking at SQLBits – 3 talks this year • I love hearing about how you use the product • I take that back to the Engineering team so we can work on the next versions of SQL Server/Azure
Talk Agenda • Problem Statement • (Quick) Summary of SQL Server’s Optimizer • DQ Optimization Approach • Under-the-hood Examples • Distributed Partitioned Views • Common Troubleshooting Techniques
Problem Statement • Data Living on Different Servers? • Data Living on non-SQL Server? • Need to Manage Many Servers? • Want to move data from one server to another without dealing with SSIS? • … • There are many reasons to use Distributed Query – it fills many holes
Optimizer Overview • I gave a SQLBits talk on this 2 years ago • You can watch that talk on sqlbits.com • Key Concepts in the Optimizer: • Operators shaped into trees • Trees and Sub-Trees have Properties • Rules transform sub-trees into new sub-trees • Equivalent sub-trees get stored in a management structure called the “Memo” • The sequence of rules and heuristics is applied to try to generate good query plans efficiently
DQ Optimization Goal • DQ tries to make remote tables appear to be local (so you don’t care that they are remote) Server 1 (Local) Server 2 (Remote) SELECT SUM(col1), col2 FROM <remotetbl> WHERE col3 > 10000 GROUP BY col2 SELECT SUM(col1), col2 FROM localtbl WHERE col3 > 10000 GROUP BY col2 Expectation: Push operations to remote server • That works for the basic cases • What about more complex cases?
Next Example – Should it Remote? • Let’s try a cross product: Server 1 (Local) Server 2 (Remote) SELECT * FROM <remotetbl> as t1, <remotetbl> as t2 Should it remote?
How About This One? • Join Small Local Table to Large Remote Table Server 1 (Local) • Pulling a big table over the network is expensive • It would be great if we could get that join condition to remote… SELECT * FROM smalllocal as L, <bigremote> as R ON L.col1=R.col1
DQ Optimization Differences • Data is remote, expensive to move (network) • Often the desired behavior is pretty basic – remote if you can do so • Sweet spots for several optimizations changes • We force several optimizations we use only for “expensive” local queries (example: pushing local group by to the remote source)
One Layer Deeper… • SQL Server’s QP acts like a SQL Server client • Based on OLEDB • It can talk to most OLEDB providers, not just SQL Server • So you can pull data from Oracle or DB2 or Excel or Text Files or even write your own provider • Each phase of query compilation and execution are overridden to use remote data instead
Query Binding • We load metadata from OLEDB schema rowsets instead of our own system tables • DBSCHEMA_TABLES, _COLUMNS, _INDEXES, … • Metadata is cached locally to avoid round trips • OLEDB Types converted to closest SQL type • Lossy conversions possible for non-SQL Server • We ask for the output schema for views and sprocs by compiling them on the remote side • If we do, we try to cache this connection for execution
Optimization • General Goal: Remote Large Subtrees • We do use statistics, indexes, and some constraint information from remote sources • We can work against SQL providers, Index providers, or simple table providers • We start with a “get all data from remote source” plan and try to find better plans • Startup and per-row costs for remote sources are expensive • We also tweak lots and lots of rules to run differently for DQ (no trivial plan, different join reordering, aggressive local-global agg pushdown) • Finally, we generate lots of subtrees that remote and pick the “cheapest” one per our cost model
Optimization Search GB(b,c) SUM(C.d) GB(b,c) SUM(C.d) GB(b,c) SUM(C.d) Join Join Join GB(c) SUM(C.d) Join Join B B RmtC Join RmtA B RmtA RmtC GB(b,c) SUM(C.d) RmtC RmtA GB(a,c) SUM(C.d) GB(a,c) SUM(C.d) Join Join Join Join B B B RmtA RmtC
Execution • Mostly similar to regular OLEDB clients • Open DB, SetCommandText, Execute, Read Rows • Some parts are more unique • Compile and Execute are 2 separate steps • We have to validate the plan is still valid • So we compare the schema compile vs. execute (and recompile if needed) • We find many provider bugs nobody else does here • Note: we can remote lock hints in remote queries
Reading DQ Execution Plans • ICommand::Execute/IOpenRowset opens each scan initially • We retrieve rows in batches (50-100) when possible • Each new NLJ scan of inner side calls IRowset::RestartPosition • We stop reading when we have satisfied the query requirements (only do complete scans when necessary) 1 2 3 4
(Distributed) Transactions • Transactions ensure correctness • Distributed Transactions require multiple databases to either commit or abort together • Microsoft ships a component called MSDTCthat: • Provides a common service for dist. Transactions • Works for non-database things (queues, etc.) • Brokers between transaction protocols of different vendors • DQ uses this component • Not all queries require transactions, and DQ optimizes performance by only starting a DTC when necessary • Configuring MSDTC is done on the Domain Controller by the Domain Administrator…
Double-Hop Authentication • Use Integrated Auth? Get Errors through DQ? • This scenario happens in different places • User->IIS->SQL Server • User->SQL-(DQ)->SQL • This is known as the “double hop problem” • Don’t be afraid! It is possible to flow credentials and use your domain identities – talk to your domain administrator to define your SPN and permissions! Links to read: http://msdn.microsoft.com/en-us/library/ms189580.aspx http://support.microsoft.com/kb/238477
Distributed Partitioned Views • DPVs were an early scale-out model in DQ • You split a table by ranges and put each on its own server (check constraints for the ranges) • A UNION ALL view tied them together • DQ then did various optimizations including: • Pruning of unneeded partitions • Startup predicates to do dynamic pruning • Downsides: • Compilation time was high • Commands not done in parallel to each server • This feature influenced our partitioned tables design
Troubleshooting • Biggest problem in DQ is “it didn’t remote” • Various reasons: • Some function isn’t supported by DQ • Exotic data types (XML, CLR types) • Correctness issues – most date issues only trust the local clock (otherwise results can differ when you remote) • Sometimes the costing model will be close on 2+ plan choices and a plan will “stop remoting” (switch plans) to one that we think is similar in cost but is not • Workarounds: In most cases, OPENQUERY() can be used to specify the exact text you wish to remote. Think of this as plan forcing for Distributed Query • Also note: • SQL Server – SQL Server remoting is much better than SQL-Other DMBS vendors (our algebra and theirs does not always align)
Conclusion • Thank you for your attention • Questions?