What is third-party data duplication?
Most companies are already leveraging their operational and transactional data for generating valuable insights. They are also increasingly tapping into external data for more sophisticated analysis and deeper insights. Aiming to gain a competitive advantage through personalized customer experience and improved efficiency, they often look for third-party data to augment the internal data sets. External data is a big market today. For example, marketers spent over $20 billion worldwide on third-party data in 2019 to acquire new customers and enrich the information about existing customers.
The problem arises when different lines of business purchase data for different needs, and the data stays in departmental silos. Finance may buy economic forecasts, while production may look for supplier data and market demand forecasts. Sometimes different departments need specific data sets for the same entity, but from unique perspectives. Marketing may want customer sentiment data to personalize customer experience, whereas risk management may buy customer credit rating data for risk detection and mitigation. If companies do not have a central data repository, these expensive external data sets remain with individual departments.
The data acquisition process is often an afterthought as explained in The Ins and Outs of Data Acquisition: Beliefs and Best Practices, and may not be formally set up or efficiently managed. Lack of a well-defined data sharing policy also adds to the trouble, because teams cannot quickly locate and access relevant data.
Without a standardized data acquisition process, and unaware of the data sets previously purchased, departments keep acquiring data as required. The result? Despite a significant spend on third-party data, companies frequently end up buying the same data sets or data sets with huge overlaps. One IDC research highlights that a typical enterprise has at least 10 copies of any structured data source at a given time. In a large B2B or B2C enterprise, hundreds of thousands of dollars are wasted on redundant third-party data purchase every year. In addition, 10-20% money is wasted on unnecessary memory, infrastructure, and integration to store and process this redundant data. What’s more, the inconsistent purchase-as-needed approach delays data acquisition and onboarding, delaying the time to market.
The challenges of using third-party data
Organizations report a wide variety of business and technical challenges in deriving insights from external data. Among the business challenges are the size and complexity of the data-provider market, which can make it hard to identify the right data sources and partners as explained in this Harvard Business Review article How Third-Party Information Can Enhance Data Analytics. The technical challenges include:
- No central repository of acquired data: Duplicate data in different organizational silos
- Lack of standardized data acquisition process: Third-party data is acquired without any strategic planning, oversight, or best practices
- Missing metadata: No visibility into the context, quality, and business value of purchased data
- Inconsistent definitions across data sets: No standard use of terms or universal definitions across data vendors
- Lack of well-defined data sharing policies: Non-compliant use of data and no lineage records
Companies are waking up to the critical aspects and gaps as the need for external data acquisition increases. These gaps can be closed with an integrated data governance and data catalog solution.
How to streamline data acquisition and eliminate duplicate data spending
Addressing the duplicate data spending is an opportunity to take a more comprehensive approach to streamlining and managing the acquired data. With a data catalog with built-in governance, you can create a central repository of all the third-party data assets you own and you plan to buy in future. Once you have the repository in place with common definitions, you can standardize and centralize the data request funnel.
Forrester Consulting puts on record that 48% of companies achieved a better understanding of their data to drive insights and actions after implementing data catalogs. With Collibra Data Catalog, you can achieve direct cost saving of 20% in third-party data purchases, resulting in $500K to $3M per year. Resolving data redundancies can help improve productivity of your data governance specialists and data analysts by 25%, which can contribute an additional $500K to $5M per year.
With a governed data catalog, you can:
- Standardize the data acquisition process to reduce duplicate spending on the same information
- Set up enterprise-wide standard definitions for external data sets
- Capture the metadata and use it with lineage graphs to uncover redundancies
- Automatically classify external data sets by their type, sensitivity, and value to the organization
- Certify external data sets with the help of the subject matter experts to improve trust and confidence in your data
- Improve compliance of external data usage to reduce risk and potential cost