Both Collibra and Snowflake play a critical role in supporting successful digital transformation strategies across a wide range of industry verticals. Each helps their customers make more effective data-driven business decisions, but do so in ways that are interdependent. Together Collibra and Snowflake bring joint value to their customers through their complementary cloud platforms. This is why we are excited to be a part of the highest level of partnership with Snowflake, a Snowflake Elite Partner.
The value of Snowflake and Collibra
Clients use Snowflake when they are looking for an agile, scalable platform to aggregate, store and analyze data collected from across their data silos. In building out that kind of capability – a cloud-native enterprise data platform – organizations need to ensure data is properly governed so that it can be easily found, understood, trusted, accessed securely, or shared. This is where Collibra excels – in curating and managing the metadata needed to make that happen.
Mutual customers use Collibra to automatically harvest technical metadata (describing the data stored in Snowflake) and supplement that with insights from data owners, stewards, policy experts and consumers. Collibra supports auto classification of data, enabling users to identify sensitive and personally identifiable data assets in their Snowflake environment. Collibra also automatically tracks end-to-end data lineage from data consuming applications all the way through to source systems. For example, Collibra can track data from business intelligence (BI) reports and dashboards (created using tools like Tableau, Microsoft PowerBI or Google Looker) to specific Snowflake data tables and columns, source systems used to populate those tables, and any ETL or analytical processes along the way.
This kind of intelligence helps organizations leverage data as a strategic asset – guaranteeing it is properly governed to assure quality, promote consistency, and mitigate operational risks from potential data misuse or misappropriation.
The challenges of scaling data and analytics
The modern enterprise uses data to gain a richer and deeper understanding of its business operations. But invariably, they require aggregating data from multiple applications, systems and sources. Snowflake has proved highly successful in enabling that – offering an agile, scalable platform to store and analyze enterprise data.
However, by aggregating data from across the enterprise, organizations will face challenges relating to:
Data discovery. More data sources and greater varieties and velocity of data inevitably means more difficulty finding the right data.
Data quality and consistency. Concerns for data quality and consistency arise due to the sheer volume of data, as well as the multitude of data sources within an enterprise.
Compliance. Sharing data using traditional ETL pipelines and integrations, even within the bounds of an organization, is not straightforward. Rules and policies around data privacy, security and sovereignty will impact who is entitled to access certain data sets. Lack of knowledge in this area can pose serious liabilities.
Collibra and Snowflake: enabling the data-driven enterprise
The partnership between Collibra and Snowflake enables data-driven enterprises to unlock the value of scalable cloud technologies for storing and analyzing more data, while assuring trust in data and analytics through proper governance.
Implementing a data catalog is vital for addressing data discovery. Catalogs should not only help users locate data relevant to their analysis, but also aid their understanding by providing context. This contextual information not only helps users find the right data more easily, but can also speed up provisioning of the data for analysis.
Data Governance
Effective data governance positively impacts data quality and consistency. Data owners maintain accountability for their data domains, ensuring quality assurance processes are in place and highlighting the most trustworthy sources through certification. Proper governance also addresses compliance, ensuring users are aware of all policies restricting use and access of specific data sets.
Data Lineage
Data lineage can help to address data quality, consistency and compliance. Tracking the way data flows through an organization helps ensure the adherence to policies relating to sensitive information.
Data Quality
In tandem with promoting data quality through better governance, data quality tools can also have a positive impact by profiling data to identify issues for remediation, including inaccuracies, redundancies and incomplete data sets. Such tools can also support compliance requirements by flagging sensitive data.
Advancing the lake: Using Collibra for the Cox Automotive data marketplace, Collibra Data Citizens ‘21
“We use Collibra as our one stop shop for data consumers and Collibra became part of our data marketplace suite along with AWS and Snowflake. Ensuring that we were governing data safely, but also enabling people throughout the organization to find and use the data they need to maximize value to our customers.”
“We need to also make sure people can get their hands on data. And so we’ve embedded the entitlement request process into Collibra for all of the data in our data platform. And so, as you can see, as users check out data sets and the data basket, we have tied our usage policies into that, where we can actually have you read and acknowledge acceptance of how to use the data before you actually get your hands on it. This process enables us to govern data access both in S3 and in Snowflake and enables kind of that full circle process for an end user. So you’re not swivel chairing between screens and applications.”
-Susan Twadell, AVP, Product Management, Cox Automotive
Native, cross-cloud capabilities
Snowflake is architected to be cloud-native, which means that its platform takes advantage of the elastic nature of cloud services. Elasticity means more than just scalability; it is really about the speed with which organizations can scale up and back down again. By separating compute from storage, Snowflake makes it easy to add resources when they are needed, enabling both vertical scaling for query performance and horizontal scaling for concurrency.
In addition to being cloud-native, Snowflake also offers cross-cloud capabilities. This makes the platform particularly relevant for enterprise clients that have opted for a multi-cloud strategy, as the same Snowflake technology stack can be used to collect, store and analyze data on AWS, Azure or GCP.
Given that Collibra supports its clients in governing data across any infrastructure – including hybrid, multi-cloud architectures – this reinforces the complementary nature of the partnership.
Freddie Mac’s data ecosystem transformation pivoted on Collibra, Collibra Data Citizens ‘21
“We selected Collibra to meet our data governance and data stewardship requirements, as well as, to improve the collaboration within and across different Freddie Mac teams.
The metadata, data classifications, and technical metadata from Collibra power the data lake access management engine and the pipelines that hydrate our data lake. As the data lake gets hydrated with the approved data sets, the data in the lake is already curated and is ready for access by data consumers. In addition, the metadata from S3 buckets, parquet files, and Snowflake is available in Collibra for data consumers.
Data users are able to get an integrated business view of the technical metadata, data quality, data movement controls, and business metadata in a single central universal platform.
Having an integrated business view of this rich content is very powerful – it empowers our business users so that they have the confidence they need to use the data they need.
-Vikram Chopra, Senior Manager, Single-Family, Data & Decisions, Freddie Mac
Secure data sharing
Another of Snowflake’s USPs is its secure data sharing capabilities. There are many use cases where data owners want to share access to data – either with another operating unit or a business partner – but do not want to create copies and ETL pipelines. Snowflake has a built-in set of capabilities that remove the need to copy the data in order to share it within or outside your organization. The same underlying technology is used on the Snowflake Data Marketplace to help businesses not only discover and access third-party data, but also to monetize their own data assets.
Sharing data is a great way to derive value with partners and customers. But it can also be complex from a compliance perspective. Data privacy regulations have evolved quite rapidly across the globe over the last few years, and we expect that trend to continue. With different rules impacting different industries and jurisdictions, it is important to keep a fine-grained understanding of which data sets can be shared and under which circumstances. Collibra supports data governance requirements relating to policy management and data privacy, helping to complement Snowflake’s data sharing capabilities.
Conclusion
Collibra customers such as Cox Automotive, Freddie Mac, and Sub-Zero use Collibra and Snowflake to empower business users to make data-driven decisions. Snowflake provides an easy-to-use cloud-based platform that enables users to aggregate, store and analyze data collected from across the enterprise. Collibra provides data quality assurance and serves as the governance tool on top of Snowflake that guarantees trust and accuracy in the data. Together, Collibra and Snowflake help organizations leverage data as a strategic asset, promote consistency, and ensure data compliance.
Learn more about the partnership, visit collibra.com/snowflake