1 / 62

Best practices to ensure efficient data models, fast data activation, and performance of your SAP NetWeaver BW 7.3 data

Best practices to ensure efficient data models, fast data activation, and performance of your SAP NetWeaver BW 7.3 data warehouse. Dr. Bjarne Berg COMERIT. What We’ll Cover …. Introductions EDW Data Design and Data Modeling Data Loading and Fast Activations

lucas
Télécharger la présentation

Best practices to ensure efficient data models, fast data activation, and performance of your SAP NetWeaver BW 7.3 data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Best practices to ensure efficient data models, fast data activation, and performance of your SAP NetWeaver BW 7.3 data warehouse Dr. Bjarne Berg COMERIT

  2. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  3. InThis Session … • This is a advanced technical presentation intended for developers with significant experience with SAP BW in a hands-on role • We will look at many EDW technical design options and the pros and cons of some of the new design features in HANA and BW 7.3 • During the presentation we will look at many real examples from 5 real implementations and explore what can be learned from these

  4. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  5. Data Design The Use of Layered Scalable Architecture (LSA) SAP BW 7.3 SP-3 has a set of 10 templates to help build a layered data architecture for large-scale data warehousing • The LSA consists logically of: • Acquisition layer • Harmonization/quality layer • Propagation layer • Business transformation layer • Reporting layer • Virtualization layer

  6. EDW Design Vs. Evolution An organization has two fundamental choices: • Build a new well architected EDW • Evolve the old EDW or reporting system Both solutions are feasible, but organizations that selects an evolutionary approach should be self-aware and monitor undesirable add-ons and ‘workarounds”. Failure to break with the past can be detrimental to an EDW’s long-term success…

  7. Data Design - Real Example of LSA Implementation This company implemented a full LSA Architecture and also partitioned the Infoproviders for faster data loads and faster query performance. While this architecture has benefits, there are significant issues around data volumes and the Total cost of Ownership when changes are made to the data model(s)

  8. Data Design Example of LSA Simplification In HANA Since many of the benefits sought by the LSA architecture are inherent in HANA, significant simplifications can be made to the data design and data flows This design has a dramatically smaller cost of ownership (fewer objects to maintain, design and change) than the traditional LSAs

  9. EDW - Complex Layered Architectures Consolidation Processes: Clearing Load Foreign Exchange Eliminations Optimizations Consolidation Cube (OC_CON) • This BPC on BW system was experiencing substantial load performance issues • Some of this was due to underlying SAP BW configuration, while some was due to the technical configuration of the data store architecture and data flow inside SAP BW BPC Staging Cube (BPC_C01) Real Example GL Summary Cube (FIGL_C03) Production Issues included: Dependent jobs not running sequentially, i.e., load from Summary cube to Staging cube is sometimes executed before the summary cube data is loaded and activated, resulting in zero records in the staging cube. Long latency with 6 layers of PSA, DSOs, and InfoCubes before consolidation processes can be executed. Conformed Reportable DSO FIGL_D21 FIGL_D20 FIGL_D17 FIGL_D14 FIGL_D18 Write Optimized DSO FIGL_D15S FIGL_D13S FIGL_D10S FIGL_D08 FIGL_D11S Persistent Staging Area (PSA) ECC 6.0 Asia- Pacific ECC 6.0 North-America ECC 4.7 Latin-America R/3 3.1i EU ECC 4.7 ASIA

  10. Fixes to Complex EDW Architecture • The fix to this system included removing the conformed DSO layer, with BEx flags for data stores that are never reported on. • Also, the BPC staging cube served little practical purpose since the data is already staged in the GL Summary cube and the logic can be maintained in the load from this cube directly to the consolidation cube. Consolidation Processes: Clearing Load Foreign Exchange Eliminations Optimizations Consolidation Cube (OC_CON) GL Summary Cube (FIGL_C03) Real Example Long-term benefits included reduced data latency, faster data activation, less data replication, smaller system backups as well as simplified system maintenance. Write Optimized DSO FIGL_D15S FIGL_D13S FIGL_D10S FIGL_D08 FIGL_D11S Persistent Staging Area (PSA) ECC 6.0 Asia- Pacific ECC 6.0 North-America ECC 4.7 Latin-America R/3 3.1i EU ECC 4.7 ASIA

  11. EDW Data Design - Use of MultiProvider Hints in BW • If a query has restrictions on this characteristic, the OLAP processor is already checked to see which part of the cubes can return data for the query. The data manager can then completely ignore the remaining cubes. Problem: To reduce data volume in each InfoCube, data is partitioned by Time period. A query must now search in all InfoProviders to find the data. This is very slow. Solution: We can add “hints” to guide the query execution. In the RRKMULTIPROVHINTtable, you can specify one or several characteristics for each MultiProvider, which are then used to partition the MultiProvider into BasicCubes. An entry in RRKMULTIPROVHINT only makes sense if a few attributes of this characteristic (that is, only a few data slices) are affected in the majority of, or the most important, queries (SAP Notes: 911939. See also: 954889 and 1156681).

  12. EDW Data Design - Semantic Partitioned Objects (SPO) • When data stores and InfoCubes are allowed to grow over time, the data load and query performance suffers • Normally objects should be physically partitioned when the numbers of records exceed 100 – 200 million • However, this may be different depending on the size of your hardware and the type of database you use • In SAP NetWeaver BW 7.3 we get an option to create a Semantic Partitioned Object (SPO) through wizards • You can partition based on fields such as calendar year, region, country, etc.

  13. Data Design - Semantic Partitioned Objects (cont.) • When an SPO is created, a reference structure keeps track of the partitions. The structure is placed in the MultiProvider for querying. SPO Wizards create all Data Transfer Processes (DTP), transformations, filters for each data store, and a process chain

  14. InfoCube Design - Size • In this example, many of the InfoCubes are very large and not partitioned • Several have over 100 million records and one is approaching 500 million • In this system SPOs in BW 7.3 can be very helpful. For BW 7.0 many of these cubes can be physically partitioned with hints on the MultiProviders Real Example InfoCubes should be performance tuned if the number of records exceeds 100 million and partitioned before they are approaching 200+ million records. This creates faster loads, better query performance, and easier management.

  15. InfoCube Design - Use of Line Item Dimensions • Line item dimensions are basically fields that are transaction oriented • Once flagged as a line item dimension, the field is actually stored in the fact table and has no table joins • This may result in improvements to query speeds for cubes not in BWA or HANA Explore the use of line item dimensions for fields that are frequently conditioned in queries. This model change can yield faster queries.

  16. InfoCube Design — High Cardinality Flags • High-Cardinality flag for large InfoCubes with more than 10 million rows • There are currently 11 InfoCubes with a ratio of more than 30% of the records in the dimensions vs. fact table • SAP recommends for Indexing and performance reasons to flag these as “high-cardinality” dimensions. However, it has minor impact to smaller cubes. • In this example, there were four medium and large InfoCubes that are not following the basic design guidelines, and subsequently had slow performance Real Example Many companies should redesign large InfoCubes with high-cardinality to take advantage of the standard performance enhancements available.

  17. DSO Design and Locks on Large Oracle Tables In this example, many of the very large DSOs are not partitioned, and several objects have over 250 million records Additionally, 101 DSO objects were flagged as being reportable. This resulted in System IDs (SIDs) being created during activation. Combined, these resulted in frequent locks on the Oracle database and failed parallel activation jobs Real Example Partition DSOs. The lock on very large DSOs during parallel loads are well known and SAP has issued several notes on the topic: 634458 'ODS object: Activation fails - DEADLOCK' and 84348 'Oracle deadlocks, ORA-00060.'

  18. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  19. An Example of a - BW System Review Database: Oracle version 11.2g BW system: BW version 7.3 Operation systems: HP-UNIX; Linux for BWA; AIX 6.4 for three app servers Real Example In this section, we take a look at a real example of a BW Implementation and explore what we can learn from it.

  20. System Background • There were over 600 InfoProviders in the system • The system had been in production for 6 years and has was upgraded in 2012 • Most InfoCubes followed standard development guidelines, but some had abnormalities such as InfoCubes feeding DSOs. Real Example Structured design review sessions should be undertaken as part of every project to assure that this design did not continue.

  21. Faster Data Load and Design Options — Activation • During activation, SAP NetWeaver BW 7.0 has to lookup in the NIRV table to see if the object already exists • This can be a slow process • In SAP NetWeaver BW 7.0 we may buffer the number ranges to compare the data load with records in-memory • This speeds up data activation • However, in SAP NetWeaver BW 7.3, the data activation is changed from single lookups to package fetch of the active table, resulting in faster activation and less locks on the lookup tables • This activation method results in 15-30% faster data activation

  22. Example: Data Activation and Options This resulted in many reads on the NRIV table that slowed down data activation and process chains (see SAP notes: 857998, 141497, 179224 and 504875) Real Example • During activation, there was limited use of buffering of number ranges for dimensions and InfoObjects, even when the number of entries were large. Start buffering of number ranges of dimensions and InfoObjects or use 7.3 data activation instead (‘packet fetch’)

  23. More Data Load Ideas • In BW 7.3 for data transformations, the option “Read from DataStore” for a faster data lookup is also available • Additionally, the use of navigational attributes as sources in Masterdata transformations reduce overhead for lookups • Combined, this may lead to an additional 10-20% improvement The 7.3 initial load runtime option “Insert only” and the “Unique data records only” prevents all lookups during activation and can dramatically improve data loads when used correctly

  24. More 7.3 Performance and Cockpit Capabilities • BW 7.3 monitors and cockpit capabilities also include: • Monitor of database usage and object sizes (i.e., InfoCubes, DSOs) • Query usage statistics are more visible (similar to RSRT, RSRV, RSTT) • We can see more of the use of SAP NetWeaver BW Accelerator and sizes • Monitor for the actual use of OLAP/MDX Cache and hit ratios • You can now selectively delete internal statistics in RSDDSTATWHM by date through the updated RSDDSTAT_DATA_DELETE ABAP program • There is also a MDX Editor for coding and syntax assistance Solution Manager has been updated to take advantage of these new monitors.

  25. BW 7.3 Performance and Monitoring • Additionally BW 7.3 monitors include: • DEAMON update information (i.e., RDA capacity status, usage) • A performance monitoring workbench for performance trends • Process chain monitoring (transaction: RSPCM) with error and active chain monitoring, user specific displays, and performance threshold monitoring (i.e., for SLAs) In SAP NetWeaver BW 7.3, the Near Line Storage (NLS) has been enhanced to include archiving, support for write optimized DSOs

  26. ETL Options for EDWs • In SAP NetWeaver BW 7.3 you can create generic delta extraction for the Universal Data (UD) and Database Connect (DB) options, as well as for flat files • Additionally, you can use the new DataSource adapter “Web Service Pull” to load data from external Web services • You can even create generic Web services delta loads and load the new data straight into the staging area of SAP NetWeaver BW 7.3 • While Web services does not fully support hierarchies yet, there is integration of hierarchies into the standard process flow such as transformation and DTPs, as well as being able to load hierarchies from flat files using a new DataSource

  27. The 7.3 DataFlow Generation Wizard • A great benefit is that the wizards work against any InfoProvider; i.e., you can use the wizards to create loads from DSOs to DSOs or InfoCubes • SAP NetWeaver BW 7.3 has a new, step-by-step wizard that allows you to generate data flows from flat files or existing data sources This wizard reduces the number or manual steps needed to load data. It also simplifies the development process and makes ETL work much easier.

  28. Example of Poor EDW Data Load Design Real Example • On an average daily dataextraction, transformation, and load process takes 44.8 hours if run sequentially • A substantial amount of the time is spent on data transformation (51%) and lookups are often done on large DSOs without secondary indexes. • Of the 371 million records extracted from the source, only 33.7% are written to disk • This is due to lack of ability to do delta processing for some files and also a substantial amount of transform and lookup logic in some of the ABAP rules For this System, Developers should revisit extractor design for lookups on source system instead of inside BW

  29. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  30. Database Performance (non-HANA systems) • Database statistics are used by the database optimizer to route queries.Outdated statistics leads to performance degradation. • Outdated indexes can lead to very poor search performance in all queries where conditioning is used (i.e., mandatory prompts) • The current sampling rates for this example were too low, and statistics should only be run after major data loads, and could be scheduled weekly Real Example For many systems, database statistics are outdated and may cause database performance to perform significantly poorer than otherwise would be the case. Sampling should often be changed and process chains may be re-scheduled.

  31. Database Design for EDW: B-Tree Indexes on Large Objects Real Example • InfoCubes that are not flagged as high-cardinality use a Bitmap index instead of a classical b-tree index for the joins. • This type of index does not get “unbalanced” since it uses pointers instead of “buckets” • When updating these large InfoCubes, dropping and recreating Indexes in the process chain can be very time consuming and actually take longer than the inserts • It can also result in locks when the objects are very large (100 million+ records) and when attempting to do this in parallel (see ORA-0060) Rebuilding bitmap indexes in load processing for large objects should not be a default answer for all designs. Any process chains that do that, may need to be revisited.

  32. Aggregates Are Not Needed with BW and HANA (or BWA) • At this company, there are 11 aggregates in the system • Four are related to the cube ZIR_C01 and seldom used and (two has never been used by any query) • The 7 other aggregates are used only by the statistical cubes. • Every day, 1.9 million records are inserted into the aggregates and take 35.6 minutes of processing time Real Example Delete unused aggregates. By reducing the data volume in the underlying statistical cubes (cleanup), the remaining aggregates will reduce in size and processing time.

  33. The OLAP Memory Cache Size Utilization • The OLAP Cache is by default 100 MB for local and 200 MB for global use • The system at this company was consuming no more than 80MB on average • This means that most queries were re-executing the same data (good hit ratio of over 90%) Real Example

  34. OLAP Cache — Turned Off Real Example • At one client, the OLAP cache was turned off for 131 out of 690 queries (excluding 4 planning queries in BW-IP) • The cache was also turned off for 24 out of 256 InfoCubes • The OLAP Cache mode for many of their queries could also have been stored as “Binary Large Objects (BLOB),” that could speed up caching and very large reads For most companies queries are using CKF and sums and sorts extensively, the cache read mode for most queries should be turned on

  35. Broadcast to Pre-Fill the Cache Real Example • This company’s Java Stack did not communicate properly with SAP NetWeaver BW, and multiple logons were required • As a result, broadcasting could not be used until the connectivity was set up correctly Set up Java connectivity ASAP and use the Broadcasting feature to prefill the MDX cache (OLAP Universes) for BI analytical processing intensive functions such as CKF, Sorts, Exceptions, Conditions

  36. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  37. HybridProvider and Real-Time Data • The “HybridProvider” (HP) is new in SAP NetWeaver BW 7.3. The core idea is to link the historical data inside BW with real-time data. • There are two ways of implementing an HP: • HP based on a DSO • HP based on a Virtual InfoCube

  38. Option 1: The DSO-Based HybridProvider for EDWs • Real-time data is in the DSO and historical data in the SAP NetWeaver BW Accelerator-based InfoCube • The DSO use real-time data acquisition (RDA) to load data • SAP NetWeaver BW automatically creates a process chain for the HybridProvder’s data flow • The process chain is executed for every closed request This solution provides for really fast queries, but delta logic has to be custom designed

  39. The DSO-Based HybridProvider for EDW (cont.) • This solution provides for really fast queries, but delta logic has to be custom designed and may be complex. However, the solution allows for high-frequency updates and very rapid query response. This is a good option if you have a low volume of new records and a high number of queries or operational dashboards

  40. Option 2: The Virtual Cube-Based HybridProvider for EDW • Data is read in real-time from SAP ECC, while historical data is read from SAP NetWeaver BW Accelerator • The difference depends on how often SAP NetWeaver BW Accelerator is loaded • Non-complex data logic can be applied • DTP is permitted if you do not filter the data set Warning: Virtual cubes with many users may place high-stress on the ERP system

  41. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  42. Database Growth and Cleanup - Example Real Example • The database has grown between 12 and 344 GB each month for the last year • Three months of the year saw data, logs, and PSA cleaned. Data volume declined between 63 and 275 Gb those months The database has grown by 732Gb (26%) in the last year, and the growth is uneven. Schedule “housekeeping” jobs. Better management of cleanup would result in more predictable patterns. (i.e. we found PSA data that had 10 months of load history).

  43. BW Data Reduction – Statistical Cubes • In our example, there were many statistical cubes with significant volume and no real benefits. During the BW upgrade, most of these were not cleared and are now creating poor system performance. For example: • 0TCT_C02 has 408 million rows; others also had millions of rows • Stats are collected for over 1,900 objects, queries, InfoProviders, templates, and workbooks (could be reduced significantly) • There are 8 aggregates with over 12 million rows on the stats cubes • Creating aggregates on stats cubes inserts 1.9 million rows and took 35.6 minutes for refresh each night • High-cardinality flags are set for small cube with only one million rows (0TCT_C21) Real Example Use RSDDSTAT and select “Delete Data” for old stats and also schedule periodic jobs using standard process chains.

  44. EDW on SAP HANA: 12 Cleaning Pre-Steps • Clean the Persistent Staging Area (PSA) for data already loaded to DSOs • Delete the Aggregates (summary tables). They will not be needed again. • Compress the E and F tables in all InfoCubes. This will make InfoCubes much smaller. • Remove data from the statistical cubes (they starts with the technical name of0CTC_xxx). These contain performance information for the BW system running on the relational database. You can do this using the transaction RSDDSTATor the program RSDDSTAT_DATA_DELETE to help you. • Look at log files, bookmarks and unused BEx queries and templates (transaction RSZDELETE). • Remove as much as possible of the DTP temporary storage, DTP error logs, and temporary database objects. Help and programs to do this is found in SAP Notes 1139396 and 1106393.

  45. EDW on SAP HANA: 12 Cleaning Pre-Steps • For write-optimized DSOs that push data to reportable DSOs (LSA approach), remove data in the write- optimized DSOs. It is already available in higher level objects. • Migrate old data to Near-Line Storage (NLS) on a small server. This will still provide access to the data for the few users who infrequently need to see this old data. You will also be able to query it when BW is on HANA, but it does not need to be in-memory. • Remove data in unused DSOs, InfoCubes and files used for staging in the BW system. This include possible reorganization of masterdata text and attributes using process type in RSPC

  46. EDW on SAP HANA: 12 Cleaning Pre-Steps • You may also want to clean up background information stored in the table RSBATCHDATA. This table can get very big if not managed. You should also consider archiving any IDocs and clean the tRFC queues. All of this will reduce size of the HANA system and help you fit the system tables on the master node. • In SAP Note 706478, SAP provides some ideas on how to keep the basis tables from growing too fast too fast in the future, and if you are on Service Pack 23 on BW 7.0, or higher, you can also delete unwanted masterdata directly (see SAP Note: 1370848). • Finally, you can use the program RSDDCVER_DIM_UNUSED to delete any unused dimension entries in your InfoCubes to reduce the overall system size.

  47. PSA Cleanup • For this company there were over 1.6 billion rows in the PSA • An estimated 29Gb could be freed up if data older than one month is removed • A formal retention policy that is communicated and enforced should be implemented Real Example Create standard practices for PSA cleanup and schedule regular jobs that take care of this in the future

  48. What We’ll Cover … • Introductions • EDW Data Design and Data Modeling • Data Loading and Fast Activations • Tips and Tricks for Faster Query Times • Real-Time and Near-Time reporting for the EDW • EDW Reduction and Cleanup • In Memory Options with HANA • Wrap-Up

  49. Looking Inside SAP HANA — In-Memory Computing Engine Persistence Layer Disk Storage Data Volumes Page Mgmt. Session Manager Load Controller Replication Server Metadata Manager Relational Engine -Row Store -Column Store Logger Log Volumes MDX SQL Parser Authorization Manager SQL Script Transaction Manager Calculation Engine Inside the Computing Engine of SAP HANA, we have many different components that manage the access and storage of the data. This includes MDX and SQL access, as well as Load Controller (LC) and the Replication Server. BusinessObjects Data Services

  50. Moving EDW to SAP HANA – Automated tool 1 SAP has a checklist tool for SAP NetWeaver BW powered by HANA (thanks Marc Bernard). In this tool, SAP provided automatic check programs for both the 3.5 version and the 7.x version of BW. These are found in SAP Note: 1729988. In version 2.x of this tool, hundreds of checks are done automatically in the BW system. This includes platform checks on database and application and system information. There are even basis checks for support packs, ABAP/JAVA stacks, Unicode, BW releases, and add-ons to your system.

More Related