Collibra Data Quality & Observability: Now Cloud-enabled

Product

The new cloud offering of Collibra Data Quality & Observability brings scalability, agility, and security to your data quality operations across multiple clouds.

Working with this SaaS model offering helps you

  • Reduce IT overheads by deploying data quality in the cloud of your choice without sensitive data leaving your environment.
  • Enable teams to scale compute power rapidly. 
  • Get faster time to value with automatic feature upgrades.
  • Increase scalability and agility with a future-ready architecture that enables native integration with Collibra Data Intelligence Cloud and external cloud applications.

This blog discusses the benefits of the Collibra Data Quality & Observability (DQ&O) cloud offering and how you can set it up quickly.

Why choose the cloud offering for data quality and observability

One of the reasons to move from on-prem data quality to the cloud is the reduced IT overheads. You work with cloud resources and add the compute power as required. It also facilitates faster integration of other cloud applications. Considering that many more products will soon be offered in the cloud, going the cloud route makes your architecture future-ready.

When it comes to deployment, the cloud proves to be fast, secure, and customer-friendly. Moreover, you automatically get the upgrades and always use the latest version. 

In the cloud offering of Collibra DQ&O, you can connect to over 40 diverse databases and file systems. You can run data quality and observability on data where it resides via pushdown or pull-up processing. See our complete list of connectors here.

As the data does not get pulled out, sensitive data does not leave your environment. The customer experience gets enhanced with faster time to resolution through access to application logs.

Architecture diagram for the cloud offering of Collibra DQ&O

Architecture diagram for the cloud offering of Collibra DQ&O

Prerequisite: The customer must have Collibra Edge installed on a Linux server.

  1. DQ Web App will now be hosted in the Collibra Cloud. It can connect to the DQ Connector data sources for testing the connection and for the DQ Explorer to do a data preview. Collibra will provision this web instance.
  2. A DQ Metastore will be installed within the customer environment. This DQ Metastore on Edge will hold client preview data and connector credentials. The DQ Web App will query the Metastore on Edge for the preview data. 
  3. The DQ Agent will be installed on the Edge server. For small lightweight Spark jobs, Collibra will provide Apache Spark in K3s deployed on Edge Server. For heavyweight Spark processing, you can bring EKS or your own YARN and Spark cluster.
  4. The DQ Jobs will take place locally on the Collibra Edge Server. You can increase the size of your Virtual Machine to scale vertically for more resources. For example, 32 cores or higher RAM. Hadoop compute is supported if you opt for Hadoop with your own Dataproc or EMR cluster. 

Guidelines for migration and configuration

The following points discuss different scenarios and how customers can move to the cloud offering. 

  1. For on-prem (local users) and DQ migration to Cloud, the existing DQ Metastore needs to be restored to the hosted DQ Metastore. All DQ Datasets, configs, Jobs, and Users will need migration. This is currently a DQ Professional Services task.
  2. Red Hat Enterprise Linux (RHEL)/CentOS v.8 is supported. The cloud offering is aligned with the Collibra Data Intelligence Cloud Edge Server that supports RHEL and CentOS version 8. Lower versions are not supported. 
  3. Multi-tenancy is supported in the cloud offering with no impact. The DQ Tenant Administrator can create multiple tenants.

What’s next

Collibra is committed to providing a path for all customers to achieve full integration between the Collibra Data Intelligence Cloud and Collibra DQ&O.

The cloud offering of Collibra DQ&O leads this path, easing out the administration and maintenance. It also provides an enhanced user experience in the long term. The installation details discuss the prerequisites and the process. 

We are proud to offer the DQ&O Cloud GA starting in Q4 2022.  Join us for our deep-dive webinar on Nov 16 to learn how you can take advantage of this cloud innovation in your environment to drive more value.


Related resources

Blog

The 6 data quality dimensions with examples

View all resources

More stories like this one

Dec 19, 2024 - 4 min read

Data you can count on: The secret to smarter healthcare

Read more
Arrow
Dec 18, 2024 - 2 min read

Why building confidence in your data is the answer to our bold 2025 predictions

Read more
Arrow
Dec 16, 2024 - 4 min read

Why every organization needs an AI governance council: Orchestrating data...

Read more
Arrow