How many data quality dimensions do you have?
How do you arrive at those dimensions?
How do they serve your company’s needs?
As a former data management consultant, I find that data quality strategy falls into both the tradecraft of data governance and the disruption of complex industry dynamics. The former, tradecraft, is a technological process that challenges leaders to prioritize data governance to the equivalency of any other prized commodity. The latter, managing corporate culture, represents the cornerstone of consulting best practices.
It’s the corporate culture that likely spurned the common data quality dimensions. Considering Collibra’s breakdown of the dimensions, terms like Accuracy and Timeliness represent a great way to help “manage up” in the data quality world. These terms are a great way to break down data quality dimensions in a data program. One might consider these terms in the context of the “consulting MECE” framework (Mutually Exclusive Collectively Exhaustive) for any DQ program.
Though these dimensions help understand the DQ program, do they actually help the business understand their own data? Do they simplify the “core competencies” of DQ – or create debate over taxonomies? Do these dimensions matter to the business domains? Consider the graphic below on how these dimensions might translate to domain examples. Would the business domain owner care about the DQ dimension, the applicable DQ detection feature, the example of its failure, or the impacted domain the most?
Take data completeness, for example. Completeness is commonly defined as ‘ensuring all the data is present.’ If our CEO made a business decision based on incomplete data and called that data inaccurate, would she be wrong in that labeling? If not, then it’s fair to say that accuracy (another common DQ dimension) and completeness are heavily intertwining concepts – NOT mutually exclusive. Further complicating the taxonomy, new DQ issues often show our list is NOT collectively exhaustive. Constant root cause analysis for DQ issues produces new themes and thus constantly expands the DQ dimension list (today’s trend is 7-8 themes vs the historical 5-6).
Data quality dimensions and business relevance
If one considers the above graphic, perhaps our CEO would prefer to say we were missing last names and that caused a 6-figure erroneous spend in our advertising campaign. Regardless of completeness or accuracy, the CEO cares more about the DQ incident and the impacted business, than what we call it.
I would assert that the DQ dimension framework – while a good first start – may lead to data conflicts as organizations differ on their taxonomies, creating confusion in the already disruptive tradecraft. What purpose do these dimensions serve other than to encapsulate thought within the Data Office? How can we make DQ part of the organization to simplify its concepts and show real impact outside the Data Office?
Standard data quality dimensions can struggle to consider the business impact, which is crucial for any organization. Gartner notes this issue, suggesting that the dimensions need an overhaul from the perspective of data consumers. Collibra also recognizes the need within the Data Office for DQ dimensions in the traditional sense.
Don’t tie dimensions to the topic of data quality
Further, I’ve noticed a new trend in DQ dimensions. Don’t tie dimensions to the topic of data quality, rather tie the anomalies to their source systems and/or business domains – bring them into the discussion.
Consider this example, a CDO is tasked to motivate your organization’s focus on resolving incomplete data. Would you rather hear “We have 100 completeness rules that had 200 breaks” – or would you rather hear “We have identified 200 data entry errors sourced from an internal CRM?. The DQ team has tagged over 100 rules to sales data and will propose source-based DQ measures for that line of business.”
The second statement drives governance. It implies the need for the rules and ties the breaks to an underlying root cause. It even helps augment a fix at the source by creating accountability to the issue, since the CRO may decide to deprecate the legacy CRM.
If we consider our industry norm, data tradecraft is in an advisory role to the industries we disrupt. The DQ dimensions are a great start, and I hope you expand your focus beyond the traditional 6 DQ dimensions. I suggest you look towards dimensions that speak to your industry domains.