Data Management, Quality and Governance By JulienKervizic
It is crucial within a data planning to define what to collect and where to collect it from, identifying the different sources of information. • Data layer: A data-layer is a software component that provides simplified access to data stored. It is often used to refer to a front-end component to integrated with GTM and external tags components. • Image for post • Part of defining the data is to set up the attributes and event definition that we want to collect and map these to what the different tags/systems are expecting. • Logging and Data Structures: Logging frameworks complement standard event logging with additional data in a standard format as well as allow for the integration of these events into existing data pipelines. • There is also a need to define how the data structure should store the events. There are trade-offs between having generic data structures and specialized data-structures. • Generic data structure makes it easy to leverage one data set along with others, think, for example, of a contacts table that would encompass every interaction with a customer, be it SMS, email, direct mail, or another source of information. Having the data in one generic data structure makes it very easy to consume these sets of information together. Generic data-structure, however, does not make it as easy to consume the more specialized information. For example, in the case of the contact table we previously mentioned, an email open-rate, in generic data structures, this information would likely sit within a nested object.
There are quite a few checks that can be performed to see if the data receives is matching expectation. From data structure checks, value checks, lifecycle checks, or a referential checks, all these validation processes help ensure that the data is up to spec. • Data Structure checks: Checks that the incoming data conform to the data structure. Checks the number of columns present, the datatypes. • Value Checks: Check that the values in the datasets match what would be expected from incoming data. Think, for example, of product prices. It is not normally expected to find a product price with a negative value. • Lifecycle checks: Can show if there might be missing data in the dataset. Think about a purchase on an e-commerce site. For a purchase to happen, a couple of actions need to be performed before it, such as browsing the website, clicking an add to cart or purchase button, and flowing through the checkout. A lifecycle check would check that this is indeed happening. • Referential integrity checks: provide a validation that the reference to other data objects, also called “Foreign Keys” exist. If we go back to the e-commerce purchase example, a referential integrity check could be to look at the products presents in the orders and see if they also exist in the product master data.
Data quality is a journey, it doesn’t come in one day, and the focus should be more about improving data quality than having it right on day one. Having a data governance model, implementing testing for data quality are all things that help on this journey. • A more thorough approach looks at the different areas of planning, validation, cleansing, surfacing, and documentation of the various data objects. • Data Planning • Having a clear plan on what information and how it should be collected is the first step to be able to have a good data quality management. • It is crucial within a data planning to define what to collect and where to collect it from, identifying the different sources of information.
Surfacing data to multiple stakeholders, through dashboards or datasets, help improve data quality managementby having multiple pairs of eyes on the data. Data that is surfaced and continuously monitored for business performance tend to be the data of the highest quality, as there is a business incentive to have anomalies identified and corrected. • Analyses following metrics deviation can help identify data facing some issues. For instance, the wrongful assignment to a category for a given event, imagine the assignment of a video view to a device web, ios or android: • Image for post • If the video views are suddenly getting assigned to a different category (web in the example above), chances are is that there is a wrongful assignment, most likely coming from a logging bug. Surfacing this data in the dashboard makes it very easy to identify this behavior and puts some organizational pressure on fixing the root causes of these anomalies.