There are always analogies to describe the value of data. British data scientist Clive Humby famously called it ‘the new oil’ powering modern business. That metaphor lasted for more than a decade, but was, a bit pedantically, ridiculed for being a false equivalence. For example, oil is hoarded by the few whereas data must be ubiquitous and shared. Further comparisons with atomic energy and even outer space have been presented, indicating the limitless possibilities and dangers within.
At the risk of presenting yet another analogy that misses the point, we believe data is most like water: essential, fundamental, life-affirming. To be sure, it can be muddy and treacherous. But when it’s clear and controlled, it’s critical. Yes, water can contain toxins, take you down treacherous rapids, misdirect and disorient you, etc. But it no doubt courses through every aspect of nature — human and systemic — and helps organisms not only survive but thrive. That’s how data works. It pulses through all operations within all companies, helping them build a market presence, support innovation and success. And it’s how we get to Data Intelligence.
For a company to achieve Data Intelligence there is no “easy button.” It takes coordination from numerous teams, buy-in from leadership and a clear path to get from point A to point Z. We’ve mapped out a 12-step process to make the journey as clear as possible. In this five-part series, we’ll explore the journey to Data Intelligence. We’ll develop a real-world, easily comprehensible scenario to see how data fuels key decisions and actionable initiatives, pause at each milestone to collate the intelligence, then keep moving forward.
First, let’s lay out the scenario. Imagine a good time for a good company — sales are up, new products have been well received and prospects seem bright. But at the end of the last quarter, there’s a dark spot emerging: customer churn. New customers are streaming in, and that’s great, but existing customers are heading out. Nobody expected this, and if there isn’t quite panic in the boardroom yet, it’s only a matter of time. Investors are asking questions, and management doesn’t have the answers… yet.
And so we meet “Cliff.” He’s been told to identify the problem: Why are customers leaving? What do they have in common? Of course, the answers can be found in data: There’s a pattern that reveals the reasons for the churn, and analysis that can guide business initiatives to prevent future losses.
This is a vital task, but Cliff is a business analyst. To do his job, he needs access to the correct data, and he needs it at the speed of thought. But as in many companies, this is a dicey proposition. The data is managed and guarded by multiple teams in the organization. Here’s a sampling… to name a few.
- Information Technology
- Compliance
- Legal
- Tech security
- Data admins/data scientists
- Finance
This is tribal knowledge at its most fundamental, and to break through this morass, Cliff needs to identify what data is available and where the best, most trustworthy version of it can be found. It takes security to grant access, compliance to make sure the access doesn’t violate industry mandates, technologists to extract different sources and formats, data scientists to help build lineage and transparency, etc.
This is why we at Collibra advocate so strongly for data democracy, the true backbone of digital transformation. Data must belong to every knowledge worker or data citizen, flowing through the system in such a way as to let business professionals connect, communicate and collaborate with individual styles and priorities. It can be random or automated and expanded or refocused, but it must be trusted and relevant. As a business analyst, Cliff knows what to do when he has good, trustworthy data, but where to start the search, how to get permission, who can help extract and transform the data with technologies like SQL, Python, R, etc, when and by whom will approval be granted, and the concerns go on.
In some ways, he’s like the rest of us — an information consumer, an online shopper looking for the right data. He wants to go browsing through shelves, comparing different products, load a few into his cart, then go home to build a plan. He does exactly that for other activities in his life; why can’t it be just as simple with this exercise?
With the stage set, let’s follow Cliff as embarks on his 12 steps to Data Intelligence journey…
Step 1: Build a Business Glossary
We need a common language to understand each other. This is not about nationalities, or programming tools or even different data sources. It refers to the fact that at almost any company of a decent size, there are multiple ways to say the same thing. Different constituencies within the enterprise use the same term to mean different things. For example, what is a Customer? How do you count the number of unique Customers? Will you get the same answer regardless of who you ask the question? Without a Business Glossary, the answer is most likely ‘no.’
This disparity sparks confusion: People like Cliff can’t find the data they need, or understand particular classifications, or reconcile differences between different datasets. The lack of shared understanding erodes trust, hampers organizational performance and challenges the credibility of particular business decisions.
A Business Glossary becomes your company’s semantic translator. Forcing everyone in a large organization to adopt a new language is not an option; but, helping everyone in the organization learn how to communicate with everyone else with a semantic translator will lead to clarity, efficiency and greater understanding. Business users can find what they want instinctively and intuitively, without having to master tables, fields, column names and metadata — in effect, without having to become data scientists themselves. It improves transparency by offering a comprehensive view of all business terms along with their related data, metadata and data lineage.
From any perspective and understanding, Cliff can use natural language to start his Journey. Cliff can start with a term like “churn” and not only view the approved definition, but how all of the company’s departments, business units, etc. define it, what type of data is best used to understand it, and using Collibra’s Data Intelligence services, Cliff will be on his way to discovering where the best data to support his analysis can be found.
Step 2: Establish Data Domain Models
Every company, public or private, for-profit or not-for-profit, shares in common the need to identify the most essential things to their mission. These “things” are the focal points of the company and are typically best described as nouns like Customer, Employee, Product and Location. We call these Domains and they serve as the logical representation of each of the key nouns that drive your business and establish context for any analysis you wish to consider.
Let’s take the example of the Customer domain. In any organization of size, especially organizations that present multiple products or services to their customers, you are likely to find two or more Systems or Applications that capture and store information about customers. And chances are that while they store a lot of the same information, these different systems will not store it the same way or use the same name for the same type of information. For example, suppose Department A uses Salesforce for salesforce automation and Department B uses Netsuite CRM. Both SFA solutions capture information about customers, but their underlying database does not organize or reference the information – commonly called attributes or fields – identically. Salesforce might reference Date of Birth as ‘DOB’ and store it in the same table as the customer’s name whereas Netsuite CRM might reference it as ‘Birth_Date’ and store it in a table that does not contain the customer’s name.
A logical representation of a Customer helps your organization rationalize the differences amongst the many systems and applications deployed in your environment to a common or shared description and structure. Just like the Business Glossary provides a semantic translator, so do the Domain models give you a consistent and common representation of what is most important to your company. And, for each logical attribute or field in your Domain Model, you can associate it with a Business Glossary term to help Cliff immediately associate a natural language term like churn into a starting point on the Data Intelligence graph.
There are numerous things you can associate with your logical domains that will help automate and ensure consistency in how your organization governs, ensures compliance and drives productivity throughout your data-driven business. As an example, when you model the Customer Domain, you can identify each of the logical attributes that are Personally identifiable information like full name, SSN, email address, etc. This will help shape and contour the way data is used, accessed and monitored throughout your organization.
Data Domains are at the center of the Data Intelligence universe and offer a hugely powerful and transitive relationship with any other concept, data, report, algorithm, API or otherwise, managed within your Data Intelligence Graph.
Step 3: Defining Policy Management & Reference Management
What is Data Governance? There is no shortage of opinions, articles and firm-held beliefs on this definition. Many are self-serving, most are not completely wrong or completely right, but what they all share in common is the general tenant that Data Governance is the practice of establishing and enforcing policies centered on data. We consider these the guide rails to proper data use and management that help ensure consistent adherence across the company.
Roles & Responsibilities, Data Ownership, Data Usage Agreements, Retention and Destruction Policies, and much more serve as the framework of enforcement and adherence to the rules, both company-defined as well as regulatory-defined, such that your company remains compliant, efficient and trustworthy as it relates to all data practices.
While not directly related, another foundational concept that fuels the Data Intelligence journey is the creation and management of a sound Reference Data Management solution. Reference Data is data that define the permissible values to be used by other data fields. For example, when you enter an address into an online form, you are likely restricted to a list of Countries versus entering free form. This list of Countries is an example of Reference Data. And as we discussed with Domain Modeling, where different Systems and Applications may have different ways of naming and organizing data fields, so may different Systems and Applications use different codes or values to define their Reference Data. Mapping these different codes and values to a common or shared set of codes and values makes it possible to translate and interpret data around your diverse data ecosystem.
The first three steps in the Data Intelligence Journey are not sexy and may feel a bit cumbersome. That said, they are absolutely foundational and when done right, you create the groundwork for a long-living, strategic Data Intelligence program. Glossing over or rushing through these foundational steps will translate into a short-lived, tactical project that serves as the antithesis to anything with the word “intelligence” emblazoned on it.
We have a few steps to go, so please stay tuned.