1 / 58

E110 Best Practices for Implementing with Adaptive Server IQ

E110 Best Practices for Implementing with Adaptive Server IQ. Mark Mumy Principal Systems Consultant Business Intelligence Division mark.mumy@sybase.com. About This Presentation. Hardware Configuration Creating the Database IQ-M Server Settings Connectivity Data Model Recommendations

elita
Télécharger la présentation

E110 Best Practices for Implementing with Adaptive Server IQ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E110Best Practices for Implementing with Adaptive Server IQ • Mark Mumy • Principal Systems Consultant • Business Intelligence Division • mark.mumy@sybase.com

  2. About This Presentation • Hardware Configuration • Creating the Database • IQ-M Server Settings • Connectivity • Data Model Recommendations • Database Programming • Data Manipulation • Maintenance Tasks • Memory Usage

  3. Target Audience • Database Administrators • All Levels • Query Developers • All Levels

  4. For This Presentation • Let’s Keep It Interactive • I will entertain questions on a subject • Would be happy to speak to you offline, if desired • No Question is too Basic • Chances are others may have the same question • Open Question Forum at the end • Time Permitting

  5. Hardware Configuration –Drive Arrays • Volume Manager • Not a necessary or recommended component for IQ-M • Additional overhead and software that has no value add to an IQ-M installation • Acceptable for an IQ-M single node operation where hardware does not have the ability to apply RAID to the disk devices • RAID Level • Recommend using RAID 5 as a blend between performance, protection, and cost • Recommend using raw devices (a must for multi-node IQ-M) • For details on drive array specifications and configurations see the Sun Reference Architecture Whitepapers and EMC Whitepapers on array configurations with IQ-M

  6. Hardware Configuration –Memory • How Much RAM for IQ-M • As much as possible! • Most systems should have at least 2 GB • Don’t forget additional RAM for each load being performed

  7. Hardware Configuration –Processors and Disk Controllers • Optimal CPU configuration • Queries: 1 CPU per active query – more if queries are complex and can be run in parallel • Loads: 1 CPU per HG index plus 1 CPU per 2-5 columns being loaded • Disk Controllers • 1 fiber controller per 5 write CPU’s • 1 fiber controller per 10 read CPU’s • In mixed mode operation use the 5:1 ratio • Recommend a minimum of 2 controllers

  8. Creating The Database –Creation Options • Creation Options • Block Size and Page Size • Minimum should be 128K Page Size (8K Page) • Use 256K Page for larger databases • Larger the page size the more RAM that will be required • CASE IGNORE vs. CASE RESPECT • Use CASE RESPECT whenever possible • Removes case comparison steps and improves performance • Java in the Database • Install if Java will be needed in the database • No impact if installed and not used

  9. Creating The Database –File Placement • Use relative links – makes it easier to relocate files • Use symbolic links for IQ MAIN and IQ TEMP • /IQM_devs/IQ_MAIN_00 • /IQM_devs/IQ_TEMP_00 • Place the transaction log and transaction log mirror on significantly large file system • Recommend 5 GB of filesystem space per IQ-M node for database related storage • Transaction logs, IQ MSG, scripts, catalog file, etc

  10. Creating The Database –Files Locations • Filename rules • Keep the names unique for each IQ-M node • Catalog: DATABASE_NAME.db • Transaction log: DATABASE_NAME.tlog • IQ MSG: DATABASE_NAME.iqmsg • IQ Main Store: DATABASE_NAME_iqmain_000 • 000 would be an incrementing number • IQ Temp Store: DATABASE_NAME_iqtmp_000 • 000 would be an incrementing number • IQ Configuration File: DATABASE_NAME.cfg

  11. IQ-M Server Settings –Query Performance • FORCE_NO_SCROLL_CURSORS • Should always be set to ON • Very few applications require this to be OFF • Can improve query performance

  12. IQ-M Server Settings –Query Plans • Query Plan Settings to Provide Optimal Query Information to DBA’s and Sybase Engineering • set temporary option query_plan='on'; • set temporary option query_plan_after_run='on'; • set temporary option query_timing='on'; • set temporary option query_detail='on'; • set temporary option dml_options10='on'; • set temporary option query_plan_as_html='on'; • set option QUERY_NAME = ‘Query Name‘ • Should not be set globally as the IQ MSG file will grow rapidly

  13. IQ-M Server Settings –Storage • Append_Load • Can be used to improve load performance • Will not reuse Row ID’s or the space occupied by those Row ID’s • Great for systems where large, contiguous chunks of data are deleted • May not good if random rows are deleted as it can lead to fragmentation and allocated, but unused space

  14. IQ-M Server Settings –Storage Continued • Disk_Striping • If ON, IQ-M will stripe writes to all available devices • If OFF, the first device must be full before the next is used • Disk_Striping_Packed • If ON, it forces better space usage and less fragmentation • Fragmentation is indicated when out of space messages are returned but the main dbspace used is less than 100% full • Allows Adaptive Server IQ to better utilize small pieces of unused space that remain after compression

  15. IQ-M Server Settings –Data Loads • LOAD_MEMORY_MB • Set to 0 (default) on systems with enough RAM • Set to something <= 500 on systems where RAM is tight or there are simultaneous loads taking place

  16. Client Access and Network Connectivity • Different internal environments and settings for ODBC, JDBC, and Open Client connections • JDBC and ODBC are recommended • JConnect is Sybase’s implementation of JDBC • Use the JDBC 2 driver (JConnect 5.2/5.5) com.sybase.jdbc2.jdbc.SybDriver • Open Client • Use with caution • Most applications written using Open Client expect an ASE server

  17. Client Access and Network Connectivity –Open Client vs. ODBC • See Chapter 32 of the ASA Users Manual as well as the ASA Reference Section “Transact-SQL and SQL/92 compatibility options”for complete list of differences • If writing stored procedures or embedded application code, make sure to explicitly make settings for compatibility as these options will get set to different values for Open Client vs. ODBC connections • ALLOW_NULLS_BY_DEFAULT • QUOTED_IDENTIFIER • STRING_RTRUNCATION • ANSI_BLANKS • ANSINULL • CHAINDED

  18. Client Access and Network Connectivity • AutoPreCommit Within ODBC • Set registry setting AutoPreCommit to Y • Forces applications to issue a COMMIT before each query • Go to the registry and update the corresponding Sybase Data Source Name (DSN) created, by adding a new value 'AutoPreCommit’ with a value of ‘Y' HKEY_LOCAL_MACHINE/SOFTWARE/ODBC/ODBC.INI/{DSN} • Packet Sizes • Larger packet sizes will help with large data retrieval • Use –p option in IQ-M configuration file to increase size • Use CommBufferSize parameter in ODBC connection string

  19. Client Access and Network Connectivity • Network Speed • Data retrieval will be as fast as the network • 100 MB of data will take 80 seconds on a 10 Mbit LAN • 100 MB of data will take 8 seconds on a 100 Mbit LAN • 100 MB of data will take 0.8 seconds on a Gigabit LAN • LAN speed may be the performance bottleneck for queries that return large amounts of data • The faster the network cards and LAN the better off concurrency will be: more available bandwidth per user • Network Interface Cards • Adding multiple network cards to the IQ-M node will help with network

  20. Client Access and Network Connectivity • ASE Component Integration Services (CIS) • Can have ASE CIS reference IQ-M tables (proxy tables) • Use ASE 12.5 CIS class of ASIQ • Prior to to ASE 12.5 ASAnywhere or ASEnterprise classes had to be used • Not a viable option if joining non-IQ-M tables with an IQ-M proxy table • Can map multiple ASE logins to a single IQ-M login • Data modifications should not be performed on proxy tables • Can map ASE proxy tables to stored procedure calls in IQ-M

  21. Client Access and Network Connectivity • IQ-M Component Integration Services (CIS) • Can only be used on the Solaris and WinNT versions of IQ-M • Can map to tables in Oracle (ODBC), ASE (ODBC & JDBC), ASA (ODBC & JDBC), DB2 (ODBC), MS SQL Server (ODBC), and any ODBC source • Not a viable option if joining IQ-M tables with non-IQ-M proxy tables • Data modifications may be performed on proxy tables

  22. Data Model Recommendations –IQ UNIQUE • Use whenever possible to help save storage space and possibly improve query performance • Not necessary for cardinality greater than 64K (65536) • Not necessary for datatypes of tinyint or char(1) • Does not need to be exact, but should be close to cardinality • If the value is <= 255 then IQ-M will place a 1-byte FP index on the column – 1 byte of storage per row • If the value is > 255 but <= 65536 then IQ-M will place a 2-byte FP index on the column – 2 bytes of storage per row • May slightly hinder data loads, but improve query speeds

  23. Data Model Recommendations –NULL Values • Always specify NULL or NOT NULL • Open Client and ODBC connections have different defaults • Allows the optimizer a better guess at join criteria • NULL data does not save space on the database page as it would in ASE • Will be compressed out when stored on disk

  24. Data Model Recommendations –Unsigned Datatypes • Use unsigned datatypes where possible • Use for surrogate keys and join columns • Unsigned data comparisons are quicker • The caveat to this is that Open Client may misinterpret the value if it is too large as it does not understand large unsigned data • Can convert to signed integer, numeric, or decimal if returning data to an Open Client application • This caveat applies to moving data between IQ-M servers with INSERT FROM LOCATION

  25. Data Model Recommendations –Long Varchar and Long Varbinary • Can be used to store large amounts of text or binary data • VARCHAR() or VARBINARY() datatypes • Maximum width is 32K (32768) bytes • The WORD index is the only index allowed on data 256 bytes or wider • Storage will be allocated in 256 byte chunks • A 257 byte string will require 512 bytes of storage • Much less than the 2K requirement in ASE TEXT and IMAGE columns

  26. Data Model Recommendations –Varchar vs. Char • Use CHAR() whenever possible • All data in IQ-M is stored fixed width • VARCHAR() types only add overhead • A VARCHAR(100) columns will require 101 bytes of storage • 100 bytes for data • 1 byte for the size of data • CHAR() data is blank padded, VARCHAR() is not

  27. Data Model Recommendations –Proper Datatype Sizing • Use the smallest datatypes possible for data • Be aware of all datatypes in IQ-M – there may be more than you know • If hour, minute and second information is not necessary, use DATE instead of DATETIME • If the data will fit within a TINYINT or SMALLINT datatype use that rather than INTEGER or BIGINT • Allows the engine to store data in smaller units (1-byte TINYINT or 2-byte SMALLINT versus 4-byte INTEGER or 8-byte BIGINT • Don’t over allocate storage when defining NUMERIC() or DECIMAL() as it can be costly for data that doesn’t need all that space

  28. Data Model Recommendations –When and Where to Use Indexes • Always use indexes on: • Join columns • Searchable columns • Aggregation columns • If uncertain, place an LF or HG index on the column • A column with an HNG, CMP, or WD index should have a corresponding LF or HG index (very rare circumstances will negate this) • Indexes are not needed on columns whose data is ONLY returned to the client (projected) • Use Primary Key or UNIQUE HG indexes where appropriate

  29. Data Model Recommendations –Multi-column Indexes • Currently, only UNIQUE HG and PRIMARY KEY indexes support multiple columns • HG inserts are the most expensive in IQ-M • Try to guarantee that inserts will happen at the end of the index • Place generally incrementing data at the beginning of the index list • For instance, a transaction date or batch number • Something that will try to guarantee a sequential key

  30. Data Model Recommendations –Join Column • Prefer joining integer datatypes (unsigned if possible) • Integer comparisons are quicker than character comparisons • Keep the datatypes as narrow as possible to improve join performance

  31. Data Model Recommendations –Primary Keys • Multi-column primary keys should have an additional LF or HG index placed on each individual column • Must be done manually • UNIQUE CONSTRAINT, UNIQUE HG, and Primary Key are identical structures: HG index with no G-Array • Use when possible • Helps optimizer make more informed query path decisions even if index is not used in joins or searches • You get an HG Index created automatically • This HG index has no G-Array (uses less space)

  32. Data Model Recommendations –Temporary Tables • 3 Types of Temporary Tables • “#” tables • create table #temp table ( col1 int ) • Local Temporary Tables • declare local temporary table temp table ( col1 int ) • Behave just like “#” tables • Global Temporary Tables • create global temporary table temp table ( col1 int ) • Tables structure is static across connections and reboots

  33. Data Model Recommendations –Temporary Tables • On Commit Preserve Rows • Use this option so that rows in temporary tables remain after the transaction has been committed • Temporary tables are available at the current level (parent) and all of its children • A parent cannot see a child's temporary table(s)

  34. Data Model Recommendations – Cursors • Avoid using cursors • Generally means row based processing • IQ-M was designed for set based processing • Sometimes they cannot be avoided • Open With Hold • Allows the cursor to remain open across transactions • If not used, the cursor is closed when a commit is issued

  35. Database Programming Language –Watcom SQL vs. T-SQL • IQ-M (ASA) is not 100% T-SQL Compatible, but very close • Recommend using Watcom SQL • All system procedures written with it • Many more code examples and more IQ-M people versed in it • Watcom SQL has some extensions that T-SQL does not: • Dynamic SQL • Better Loop control • Full cursor movement rather than just read next • Batches and procedures must be written in the same dialect • Cannot mix T-SQL with Watcom SQL

  36. DECLARE CURSOR GOTO IF PRINT RAISERROR SET WHILE (T-SQL) vs. LOOP Global variables Variable Names CALL FOR ASA requires variables to be declared immediately after a BEGIN Database Programming Language –Watcom SQL vs. T-SQL • Behavior differences include:

  37. Database Programming Language –Commit and Rollback • Use transaction control around logical units of work, even read only queries • Should commit before a read/write batch is started to ensure latest version of data is available • Should issue commit and rollback after batch completion to release all query resources • Rollback will free memory resources in use by previous operations

  38. Database Programming Language –Custom Functions • Custom functions can be written in either SQL or Java • Great way to encapsulate business logic for transforming data • Can have a significant performance impact on queries • Functions are executed in the catalog portion of the engine • All result rows may need to be moved to ASA • Can be time consuming for large result sets • Turn on query plans to see what impact the functions have on effective query plans

  39. Database Programming Language –Outer Joins • T-SQL Outer Join: *= or =* • ANSI Standard/Watcom SQL Outer Join: [left|right|full] outer join • Be careful of outer join syntax • The T-SQL syntax can be very ambiguous or non-deterministic for IQ-M to translate • All T-SQL outer joins must be converted to ANSI outer joins and then processed • Use the ANSI standard instead as they are not ambiguous and are always clear in their meaning • Visit http://my.sybase.com/detail?id=1017447 for more details

  40. Data Manipulation –Load Table • Parallel Load Table • Make sure to put column delimiter after last column • Must use ROW DELIMITED BY and DELIMITED BY options in load table command • Column and row delimiters must be a single character • 1 to 4 byte delimiters allowed for serial loads

  41. Data Manipulation –Load Table • If possible, perform load table from binary datafiles • Can be 3 to 10 times faster than ASCII loads • Can use FILLER() clause with a delimiter or byte count • Better performance achieved by casting the date or datetime formats rather than letting IQ-M guess • If possible, issue a single load table with multiple files rather than 1 load table per file to be loaded into a table

  42. Data Manipulation –Insert From Location • Great way to move data from any Open Client source • Syntax: insert into TABLE() location ‘SERVERNAME.DBNAME’ { select statement }; • IQ-M username and login must match on remote system • Interfaces entry must match the SERVERNAME • Can also be used to move data quickly from an ASA table to an IQ-M in the same server

  43. Data Manipulation –Single Row Operations • Avoid at all costs for large data manipulation operations • Different from single statement operations that modify many rows • Individual INSERT … VALUES() will be slower than bulk load operations • Expect no more than 5,000 to 20,000 operations per hour

  44. Data Manipulation –Named Pipes and Flat Files • Named pipes can be faster – no disk I/O • How to make a named pipe: mknod PIPE_NAME p • Can use BCP, GZIP, UNCOMPRESS, or applications to push IQ-M formatted data into named pipe • LOAD TABLE command can read from named pipe • Can also fast extract data to named pipes so that they can be read by another application or even compressed and stored

  45. Data Manipulation –Partial Width Inserts • Can induce fragmentation and overly large space consumption if not watched • Row ID’s from deleted data are not reused during a partial width insert operation • Space from delete data is not reused (because Row ID’s are not reused) • Partial width inserts are analogous to APPEND_LOAD=‘on’ in terms of Row ID and space behavior • Only becomes a problem if partial width inserts are a way of life for a table

  46. Data Loading –The Deep Fact Table • Many databases have a “deep” table • A growing table with tens of millions of rows • Rows typically ‘rolled out’ and replaced over time • HG indexes are very slow to load/delete • Loading time can steadily increases as rows are added • Solution (with a caveat) • Partition the table (Example – time: day, week, month) • Build a view that is a Union of all the tables

  47. Partitioned t1 t2 t3 Data Loading –Partitioned Fact Table Big Fact Table Accessed by View Create View bigtable as Select * from t1 Union All Select * from t2 Union All Select * from t3

  48. Data Loading – Why Partitioning? • Loads are faster and predictable • x million rows will load consistently • IQ may process the Union All in parallel • As long as cpu resources are available • To roll data out, truncate one table • Truncate table is much faster than Delete • No changes to DDL required • Load new data into the empty (truncated) table

  49. Data Extraction • For fastest data unloads use the TEMP_EXTRACT options • Can unload data in ASCII or BINARY format • Recommend BINARY format as it is faster to reload into IQ-M • Great way to archive portions of the database • Can unload to one or more files, serially • Avoid using ISQL or DBISQL redirection to a file • Much slower than fast unloads

  50. Maintenance Tasks –Database Consistency Checks • The routine to check the database for potential corruptions is SP_IQCHECKDB • There are no run-time options for this command, however the DBCC options control the behavior of the SP_IQCHECKDB command • Recommendations • Run Level 1 DBCC’s every 1 to 4 weeks (will process 1-2 GB per second) • Run Level 2 DBCC’s every 1 to 3 months (will process 50-100 MB per second)

More Related