250 likes | 416 Vues
Data Discovery. Understanding data relationships. Philip Howard Research Director – Bloor Research. Agenda. What are data relationships and why are they important? Different approaches to discovering data relationships Features you might look for in a data discovery tool.
E N D
Data Discovery Understanding data relationships Philip Howard Research Director – Bloor Research
Agenda • What are data relationships and why are they important? • Different approaches to discovering data relationships • Features you might look for in a data discovery tool
What is a data relationship? • A relationship between database tables, either within or across databases • A relationship within or across non-relational data sources • A relationship between a relational and non-relational source • Note that relationships may be complex and/or involve more than 2 elements
Why are data relationships important? • Data migration
Why are data relationships important? 2. Data archival
Why are data relationships important? 3. Master data management
Why are data relationships important? 4. Data governance
Why are data relationships important? 5. Data modelling
Why are data relationships important? 6. Business intelligence
Why are data relationships important? 7 & 8 & 9 & … Data integration Legacy migration Data warehousing …
Why are data relationships difficult? • No definition exists across multiple sources • Within a source many relationships are not explicit • Ownership of relationships is diverse • Many relationships are defined within application software and not in the data source
Data relationships in place Different issues arise when you consider relationships within systems versus across systems
Data relationships within systems • Typical functions: • Identification of primary-foreign key pairs • Dependency analysis • Redundant columns • Usually provided through data profiling, which also provides error statistics
Data relationships across systems • Requirement for relationship discovery • No requirement for error statistics • Requirement for rule violations where this represents a violation of a cross-source relationship
Specific requirements • For MDM – overlap & precedence analysis, transformation & business rules and exceptions, outlier analysis, matching keys • For data migration & archival – business entities
General functions • Automation of MDM and Profiling functions • Visualisation of relationships • Semantics • the semantic type of the data e.g. zip code • context-free discovery – e.g. recognising that cust# is equivalent to custID • Data classification: recognising the relationship between a pre-defined, business-user-maintained domain of values and the actual content of a field in order to identify the content of a field as well as unexpected values. • Business glossary
Conclusion • Understanding data relationships across data sources is important in many data management disciplines • There are relatively few tools that are good at discovering such relationships – moreover, data discovery is a broad discipline and no one tool is good at all aspects of relationship discovery.