Collibra and Databricks continue to bridge the gap between business and technical stakeholders to optimize data modernization strategies and accelerate access to insights. In the Q4 2024 release of Collibra Data Intelligence Platform, we’re proud to announce the GA of Collibra Protect for Databricks. Now, data access policies from Collibra Protect can be applied to Databricks Unity Catalog to mask columns and rows for enhanced data protection. This new integration is something that Collibra and Databricks customers have been eagerly waiting for and we’re excited for you to try it for yourself.
What is Collibra Protect?
Collibra Protect helps you simplify data access governance and define it all from a single location across all of your data sources. Our no-code policy builder means you don’t have to rely on data engineers or other technical users to ensure data is protected. You can even take advantage of automatic data classification to quickly understand the content and sensitivity of data, allowing you to scale data protection for massive volumes of data.
How does Collibra Protect work with Databricks Unity Catalog?
Collibra Protect can now take advantage of Databricks column-based or row-based masking functions (you can learn more about Databricks column-based masking functions here). In a nutshell, you can now mask (hide) data from specific users who don’t have the correct permissions to see it. Instead of creating new reports or views of data, you can create easy-to-implement policies that scale across your entire organization to keep data safe.
The great news is that no matter which platform you’re more comfortable working in, policies can be created in either. Column-based policies created in Collibra Protect are applied in Databricks and policies created in Databricks are applied directly to the columns in Databricks.
For example, for row-level data, if you create a row filter in Collibra Protect, the policy will be enforced in Databricks. Similarly, if you create it in Databricks, the policy is applied directly to the tables in Databricks. When the standards are synchronized and active, masking policies are created in Databricks and the masking functions are named collibra_masking_policy_<asset ID> so you can easily view and understand policies, regardless of where they are created.
Collibra Protect can mask a number of data types in Databricks including, BIGINT, BINARY, BOOLEAN, DATE, NUMBER, VARCHAR and more. Want to see some specific examples? You can read more here in our documentation.
Data protection is an important topic for Collibra, Databricks and you. We’re thrilled that together with Databricks, we’re making data protection faster, easier and more secure. To learn more about this feature, and the rest of the Q4 2024 Collibra Data Intelligence Platform release, head on over to our release notes.