1 / 40

Self-management features in SQL Anywhere server

Télécharger la présentation

Self-management features in SQL Anywhere server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Glenn Paulley Director, Engineering http://iablog.sybase.com/paulley Self-management features in SQL Anywhere server

    2. 2

    3. Design goals of SQL Anywhere Ease of administration Comprehensive yet comprehensible tools Good out-of-the-box performance Embeddability features ? self-tuning Many environments have no DBAs Cross-platform support 32- and 64-bit Windows (XP, Server, 2003, 2000, 9x), Windows CE/Pocket PC, Linux 32- and 64-bit, HP-UX, AIX, Solaris (SPARC and Intel), Mac OS/X, Compaq Tru-64 Interoperability 3

    4. Why is self-management important? In a word: complexity Application development is becoming more complex: new development paradigms such as ORM toolkits, distributed computation with synchronization amongst database replicas, and so on Databases are now ubiquitous in IT because they solve a variety of difficult problems Yet most companies continue to own and manage a variety of different DBMS products, which increases administrative costs Ubiquity brings scale, in several ways To keep TCO constant, one must improve the productivity of each developer 4

    5. Embeddability 5

    6. Physical database design

    7. Physical database design Logical and physical database design tradeoffs have a significant impact on performance The fundamental problem is that the potential benefits of design changes must be amortized over the execution characteristics of a workload Representative workloads are difficult to create Issues include data skew, correlation, mitigating artificial contention that occurs as a result of simulating the actual system Most DBMS vendors now offer tools to assist in index or materialized view creation (more on this later) However, not all implications are covered by any of these tools 7

    8. Physical database design with SQL Anywhere A SQL Anywhere database is composed of up to 15 dbspaces Each dbspace is an OS file One each used for temporary file, transaction log Others are for user data UNIX raw partitions are not supported Image copy is done via simple OS file copy No image copy utility is necessary Easy to deploy a database to a separate machine or CE device Files are interoperable on all supported platforms without user intervention on-the-fly data conversion done when necessary 8

    9. Physical database design By default, databases are created with Case-insensitive searching enabled Not blank-padded; no difference in the storage of CHAR vs. VARCHAR attributes on the disk SQL Anywhere supports different character sets, collations for international contexts Includes support for NCHAR data type using multi-byte UTF8 charset Supported page sizes are 2K, 4K, 8K, 16K, 32K All dbspaces must use the same page size No hard limits on row length, number of BLOB columns BLOB values (up to 2GB each) stored in separate page arena 9

    10. Physical database design Database files grow automatically as necessary to accommodate any increase in data volume Server can execute a user call-out procedure when a disk-full panic occurs Server offers a temporary file governor to ensure that connections cannot be starved when they require space for intermediate results Indexes on primary, foreign keys created automatically Server will automatically detect redundant (duplicate or subsumed) indexes two or more logical indexes share the same physical structure 10

    11. Physical database design Transaction log SQL Anywhere uses a logical logging and recovery scheme at the row level (not physiological, or page-based) Transaction log can be translated directly into INSERT, UPDATE, DELETE SQL statements DBTRAN utility ships with the software Assists in recovery from catastrophic failures Provides the backbone for two-way synchronization with Mobilink, SQL Remote 11

    12. Self-management features

    13. Memory management SQL Anywhere uses a single heterogeneous buffer pool with few predefined limits Buffer pool can automatically grow and shrink depending on database workload and OS requirements for other applications Buffer pool comprises Table data pages Index pages Checkpoint log pages Bitmap pages Heap pages (data structures for connections, stored procedures, triggers) Free (available) pages All page frames are the same size 13

    14. Page replacement scheme Buffer pool has no preset bounds for sort space, heap space, table pages Buffer pools page replacement scheme must be able to manage various categories of pages and requests on a demand basis Workloads change over time periods Static configurations must tradeoff service levels for administrative effort One basic requirement: avoid polluting the cache with the results of pages from sequential scans Identify pages that are frequently used 14

    15. Page replacement scheme Pages flow through the buffer pool over time Window (the size of the cache) is stratified into eight segments based on time of reference A pages score is incremented as it moves from segment to segment Need to differentiate reference locality (ie multiple references to a single page based on row fetch) from other forms of reuse Replacement: based on individual page score A modified `Clock algorithm goes through each page in turn; if a pages score is less than a threshold value, page is reused If page is retained, score is decayed LRU-k and 2Q algorithms are frequently referenced in the literaturehowever we have found their overheads to be too high 15

    16. Task scheduling model Basic idea: reduce server resource requirements by establishing a limited pool of worker threads to execute requests Reduces address space, virtual and real memory requirements Introduces possibility of thread deadlock Task implementation differs depending on the platform; fibers are used on Windows and Linux platforms Scheduler assigns work across a pool of workers Worker thread pool typically small, default is 20 Size of pool establishes the multiprogramming level Workers are assigned dynamically to connections on demand 16

    17. Task scheduling model Unassigned requests wait for an available thread Server uses cooperative multitasking to switch between requests Server supports intra-query parallelism With queries, optimizer determines degree of parallelism at optimization time 17

    18. Task scheduling model Waiting requests are candidates for swapping Try to maintain constant memory footprint by using a virtual memory technique Heaps for user connections can be swapped to disk if server load demands it Originally implemented in the first SQL Anywhere release in 1989 Pointers within the heap are swizzled to permit reloading of heaps on a demand basis Diagnostic connections can use a special, reserved thread to execute its requests Enables diagnosis of a hung engine 18

    19. Dynamic memory management A SQL Anywhere server will grow and shrink the buffer pool as necessary to accommodate both Database server load Physical memory requirements of other applications Enabled by default on all supported platforms User can set lower, upper bounds, default initial size OS provides API to determine memory size (requires virtual memory support) Implementation varies slightly by platform 19

    20. Dynamic memory management Basic idea: match buffer pool size to operating systems working set size Feedback control loop 20

    21. Memory allocation to server threads Query optimizer assumes the availability of an amount of available memory for each computation Some execution operators (notably hash-based operators) have the ability to free memory without changing the computation (just making it less efficient) Release of memory is done partition-at-a-time so effect is not abrupt Other operators can be constructed with alternative plans that offer a cheaper execution technique if either the optimizers choices are found to be suboptimal at run-time, or the operator requires a low-memory strategy at the behest of the memory governor 21

    22. Other self-management features On startup, buffer pool is automatically loaded with those database pages that were first requested on the last startup SQL Anywhere supports events fired on a schedule, or when specific server events occur, such as Idle time greater than a threshold Backup Connect/disconnect File size Disk space 22

    23. and adaptive query execution strategies Self-tuning query optimization

    24. Automatic statistics management A feature of SQL Anywhere since 1992 Early implementations used a hash-based structure to manage column density and frequent-value statistics See Lynch, 14th VLDB (1988) for a similar approach Today Self-tuning column histograms On both base and temporary tables Statistics are updated on-the-fly automatically Join histograms built for intermediate result analysis Server maintains index statistics in real-time Index sampling during optimization 24

    25. Column histograms Self-tuning implementation Incorporates both standard [range] buckets and frequent-value statistics Updated in real-time with the results of predicate evaluation and update DML statements By default, statistics are computed during the execution of every DML request Novel technique used to capture statistics on strings for use in optimizing LIKE predicates Histograms computed automatically on LOAD TABLE or CREATE INDEX statements Can be created/dropped explicitly if necessary But retained by default across unload/reload 25

    26. Query optimizer SQL Anywhere optimizes requests each time they are executed Optimizer takes into account server context: Working set Available cache Values of procedure, host variables Assumption: optimization is cheap SQL Anywhere uses a proprietary, cost-based join enumeration algorithm that primarily constructs left-deep trees Optimization process includes both heuristic and cost-based complex rewritings Advantages: plans are responsive to server environment, buffer pool contents/size, data skew; no need to administer packages 26

    27. Join enumeration Optimization space is over left-deep trees Optimization process includes both heuristic and cost-based complex rewritings Exceptions for complex, right-deep nested LEFT OUTER JOINS Space pruning based on a governor that allocates quota to different portions of the search space Join method, degree of parallelism, physical scan method (index selection), and expensive predicate evaluation are also part of search space Each choice is re-evaluated at each step Superb optimization times even for complex queries No hard limits tested with 500 quantifiers in a single block 100-way join on a CE device with 3MB buffer pool 27

    28. Bypassing the query optimizer Simple, single-table queries are optimized heuristically Access plans for queries in stored procedures/triggers/events are cached Plans undergo a training period where plan variance is determined If no variance (even without variable values), plan is cached and reused Query is periodically re-optimized on a logarithmic scale to ensure plan does not become sub-optimal 28

    29. Adaptive query processing In some cases the optimizer will generate alternative access plans that can be executed if actual intermediate result sizes are poorly estimated Server switches to alternative plan automatically at run time Memory-intensive operators, such as hash join, have low-memory strategies that are used when buffer pool utilization is high 29

    30. Adaptive query processing Some operations, such as database backup, contain sampling processes to determine the characteristics of the I/O device used for storage Primary goal is to determine the number of disk heads available Processes can utilize the right number of CPUs to maximize throughput Algorithms are sensitive to CPU requirements of the rest of the system, and automatically scale down CPU usage as necessary 30

    31. Intra-query parallelism SQL Anywheres approach is to parallelize a left-deep plan when doing so is advantageous There is an upper bound to the number of EXCHANGE operators in a data-flow tree; can be set by the user to override Work is partitioned independently of the availability of worker threads at run-time Plans are largely self-tuning with respect to degree of parallelism Prevents starvation of any specific subplan when the number of available workers is less than optimal for some period 31

    32. Management tools

    33. Management tools Framework DBISQL interactive SQL query tool Sybase Central administration/development tool Tools Graphical query plans Stored procedure debugger Stored procedure profiler Index consultant Request-level logging Optimization logging 33

    34. Graphical query plans 34

    35. Stored procedure debugger Enables debugging of a PSM or Java stored procedure Features: set/unset breakpoints and watch settings determine value of variables traverse the call stack execute queries with the values of variables at the time of the break 35

    36. Application Profiling scenarios 36

    37. Application Profiling architecture 37

    38. Stored procedure profiler Offers the ability to analyze the performance of a stored procedure/trigger/event Computes the cumulative invocations and elapsed time for each statement in a procedure 38

    39. Index consultant Recommends indexes to improve query performance Main idea is to improve application performance Particularly useful when DBA has limited experience Permits what-if analysis on an existing query load and database instance Allows DBA to find statements that are the most sensitive to the presence of an index Can be used to find indexes that are unnecessary, i.e. those that are not utilized by the optimizer Can be used to estimate disk storage overhead of additional indexes 39

    40. Future plans Increasingly difficult to keep optimization costs low in the face of increasingly more challenging workloads Tradeoff time for space (cached access plans) Advantages of plan caching are workload-specific; caching expense pays off only if optimization costs can be amortized over several executions Improve statistics, cost model management to permit self-repair, which will improve access plan quality Numerous other opportunities for self-tuning using feedback control mechanisms Memory allocation, multiprogramming level, intra-query parallelism controls, additional adaptive query optimization techniques Integrate server internals management with VM hypervisors to improve operation within guest VMs 40

More Related