Common Analysis Services Design Mistakes and How to Avoid Them

Common Analysis Services Design Mistakes and How to Avoid Them Chris Webb www.crossjoin.co.uk

Who Am I? • Chris Webb chris@crossjoin.co.uk • Independent Analysis Services and MDX consultant and trainer • SQL Server MVP • Blogger: http://cwebbbi.spaces.live.com

Agenda • Why good cube design is a Good Thing • Using built-in best practices in BIDS • ETL in your DSV • User-unfriendly names • Unnecessary attributes • Parent/child pain • One cube or many? • Over-reliance on MDX • Unused and/or unprocessed aggregations

Why Good Design is Important! • As if you needed reasons…? • Good design = good performance = faster initial development = easy further development = simple maintenance • This is not an exhaustive list, but a selection of design problems and mistakes I’ve seen on consultancy engagements

Best Practices in BIDS • Don’t ignore the blue squiggly lines in BIDS! • They sometimes make useful recommendations about what you’re doing • Actively dismissing them, with comments, is a useful addition to documentation • As always, official ‘best practices’ aren’t always best practices in all situations

Common Design Mistakes • Three questions need to be asked: • What’s the problem? • What bad things will happen as a result? • What can I do to fix it (especially after I’ve gone into production)? • This is not a name-and-shame session!

Problem: ETL in your DSV • It’s very likely, when you are working in SSAS, that you need changes to the underlying relational structures and data • Eg you need a new column in a table • You then have two options: • Go back to the relational database and/or ETL and make the change • Hack something together in the DSV using named queries and named calculations • The DSV is the easy option, but…

Consequences: ETL in your DSV • It could slow down processing performance • No way to influence the SQL that SSAS generates • Expensive calculations/joins are better done once then persisted in the warehouse; you may need to process more than once • It makes maintenance much harder • DSV UI is not great for writing SQL • Your DBA or warehouse developer certainly won’t be looking at it

Fix: ETL in your DSV • Bite the bullet and either: • Do the necessary work in the underlying tables or ETL packages • Create a layer of views instead of using named queries and calculations • Use the Replace Table With option to point the table in the DSV at your new view/table • No impact on the rest of the cube!

Problem: Unfriendly Names • Cubes, dimensionsand hierarchies need to have user-friendly names • However names are often user-unfriendly • Unchanged from what the wizard suggests, or • Use some kind of database naming convention • Designing a cube is like designing a UI • Who wants a dimension called something like “Dim Product”….?

Consequences: Unfriendly Names • Unfriendly names put users off using the cube • These are the names that users will see in their reports, so they must be ‘report ready’ • Users need to understand what they’re selecting • Also encourage users to export data out of cube to ‘fix’ the names • And so you end up with stale data, multiple versions of the truth etcetcetc

Fix: Unfriendly Names • You can rename objects easily, but: • This can break calculations on the cube • It can also break existing queries and reports, which will need rewriting/rebuilding • IDs will not change, which makes working with XMLA confusing • You should agree the naming of objects with end users before you build them!

Problem: Unnecessary Attributes • Wizards often generate attributes on dimensions that users don’t want or need • Classic example is an attribute built from a surrogate key column • Who wants to show a surrogate key in a report?

Consequences: Unnecessary Attributes • The more attributes you have: • The more cluttered and less useable your UI • The slower your dimension processing • The harder it is to come up with an effective aggregation design

Fix: Unnecessary Attributes • Delete any attributes that your users will never use • Merge attributes based on key and name columns into a single attribute • Set AttributeHierarchyEnabled to false for ‘property’ attributes like email addresses • Remember that deleting attributes that are used in reports or calculations can cause more problems

Problem: Parent Child Hierarchies • Parent Child hierarchies are the only way to model hierarchies where you don’t know the number of levels in advance • They are also very flexible, leading some people to use them more often than they should

Consequences: Parent Child • Parent Child hierarchies can lead to slow query performance • No aggregations can be built at levels inside the hierarchy • Slow anyway • They can also be a nightmare for • Scoping advanced MDX calculations • Dimension security

Fix: Parent Child • If you know, or can assume, the maximum depth of your hierarchy, there’s an alternative • Normal user hierarchies can be made ‘Ragged’ with the HideMemberIf property • Hides members if their parent has no name, or the same name as them • Still has performance issues, but less than parent/child • You can use the BIDS Helper “parent/child naturaliser” to convert the underlying relational table to a level-based structure

Problem: One Cube or Many? • When you have multiple fact tables do you create: • One cube with multiple measure groups? • Multiple cubes with one measure group? • Each has its own pros and cons that need to be understood

Consequences: One Cube • Monster cubes containing everything can be intimidating and confusing for users • Also tricky to develop, maintain and test • Often changing one thing breaks another • Making changes may take the whole cube offline • Securing individual measure groups is a pain • If there are few common dimensions between measure groups and many calculations, query performance can suffer

Consequences: Multiple Cubes • If you need to analyse data from many cubes in one query, options are very limited • A single cube is the only way to go if you do need to do this • Even if you don’t think you need to do it now, you probably will do in the future!

Fix: One Cube to Multiple • If you have Enterprise Edition, Perspectives can help overcome usability issues • Linked measure groups/dimensions can also be used to split out more cubes for security purposes • If you have one cube, you probably don’t want to split it up though

Fix: Multiple Cubes to One • Start again from scratch! • LookUpCube() is really bad for performance • Linked measure groups and dimensions have their own problems: • Duplicate MDX code • Structural changes require linked dimensions to be deleted and recreated

Problem: Over-reliance on MDX • As with the DSV, it can be tempting to use MDX calculations instead of making structural changes to cubes and dimensions • A simple example is to create a ‘grouping’ calculated member instead of creating a new attribute • Other examples include pivoting measures into a dimension, or doing m2m in MDX

Consequences: Over-reliance on MDX • MDX should always be your last resort: • Pure MDX calculations are always going to be the slowest option for query performance • They are also the least-easily maintainable part of a cube • The more complex calculations you have, the more difficult it is to make other calculations work

Fix: Over-reliance on MDX • Redesigning your cube is a radical option but can pay big dividends in terms of performance • Risks breaking existing reports and queries but your users may be ok with this to get more speed

Problem: Unused Aggregations • Aggregations are the most important SSAS feature for performance • Most people know they need to build some and run the Aggregation Design Wizard… • …but don’t know whether they’re being used or not

Consequences: Unused Aggregations • Slow queries! • If you haven’t built the right aggregations, then your queries won’t get any performance benefit • You’ll waste time processing these aggregations, and waste disk space storing them

Fix: Unused Aggregations • Design some aggregations! • Rerun the Aggregation Design Wizard and set the Aggregation Usage property appropriately • Perform Usage-Based Optimisation • Design aggregations manually for queries that are still slow and could benefit from aggregations

Problem: Unprocessed Aggregations • Even if you’ve designed aggregations that are useful for your queries, you need to ensure they’re processed • Running a Process Update on a dimension will drop all Flexible aggregations

Consequences: Unprocessed Aggregations • Slow queries! (Again)

Fix: Unprocessed Aggregations • Run a Process Default or a Process Index on your cube after you have run a Process Update on any dimensions • Note that this will result in: • Longer processing times overall • More disk space used • But it will at least mean that your queries run faster

Thanks!

P/X001 The Developer Side of the Microsoft Business Intelligence stack Sascha Lorenz P/L001 Understanding SARGability (to make your queries run faster) Rob Farley P/L002 Notes from the field: High Performance storage for SQL Server Justin Langford P/L005 Service Broker: Message in a bottle Klaus Aschenbrenner P/T007 Save the Pies for Lunch - Data Visualisation Techniques with SSRS 2008 Tim Kent Coming up… • #SQLBITS

Common Analysis Services Design Mistakes and How to Avoid Them