What is a data catalog?

The primary role of a data catalog is to create an inventory of all the data within an enterprise, most commonly for data governance and in some cases for distributed queries and access management.

Gartner expands upon this definition as follows:

A data catalog creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value. Modern machine-learning-augmented data catalogs automate various tedious tasks involved in data cataloging, including metadata discovery, ingestion, translation, enrichment and the creation of semantic relationships between metadata. These next-generation data catalogs can therefore propel enterprise metadata management projects by allowing business users to participate in understanding, enriching and using metadata to inform and further their data and analytics initiatives.1

What is a data exchange?

A data exchange (sometimes called a data marketplace) is a digital platform that manages data as a product to make it easy to find, use, manage and monetize. It connects data suppliers and consumers through a seamless experience.

Data exchanges can be public, in which case the data is typically exchanged as part of a commercial transaction. Or increasingly, they are deployed within an enterprise to provide business stakeholders with seamless access to curated data products. In this scenario, the interactions can be either transactional (find, access, use) or collaborative (bringing stakeholders together to customize data products).

Eckerson Group Report: Rise of Data Exchanges

What do they have in common?

Data catalogs and data exchanges exist for similar reasons. With the enormous volume of data in various repositories and aggregations inside an enterprise, it’s difficult to understand what data is available, valuable and useful for any particular business use case. 

The data catalog addresses this challenge by creating a searchable inventory of all that data. The data exchange enables enterprises to curate the best and/or most in-demand data for easy access and consumption. 

Where do they differ?

If you think of a traditional retail business, a data catalog would be the inventory list of everything in the warehouse. It tells you what you have, where to find each item and some details about them (metadata). However, you do not invite your customers to shop in your warehouse or through your inventory list, because it’s not easy to understand and find what you’re looking for. Items are packaged differently and some may be broken or expired. Instead, you curate the best items to display in your storefront where everything the customer needs to know to make a decision is readily available. 

A data exchange on the other hand is focused on quickly realizing target outcomes from data products. Instead of listing all the data you have, it converts curated data assets into data products, making it easy to:

  • Find data. A data exchange curates data that is in demand and makes it available for consumption through an intuitive digital interface.
  • Understand data. Data in a data exchange is packaged as products with names, descriptions, visual or tabular previews, and clear terms of use.
  • Use data. When packaged as a product, data can be consumed by more potential users, especially non-technical users. 
  • Share data. Data products can be made available securely to any group of internal or external stakeholders with a few clicks.
  • Collaborate on data. Analyze, filter or blend data in conjunction with colleagues to meet your business requirement using the secure, collaborative workspaces and pre-populated tools within a data exchange. 
  • Manage data. Data products can be easily updated and those changes are proliferated to all subscribers of that product automatically.
  • Measure the value of data. Understand how and by whom data is used and collect direct feedback to continuously improve data products.
  • Scale outcomes from data. Break the cycle of one-off processes by sharing the outcome of the work done on data with others as new custom data products.
  • Make data processes repeatable. If you need to apply the same process to a data set repeatedly, you can build the script once and it will execute each time your dataset changes. 

Some of the use cases facilitated by data exchanges that are not possible with a catalog include internal and external data sharing and collaboration, data ‘clean rooms’, optimizing third-party data procurement and management. Data exchanges also enable data transactions, ranging from trial access with fixed terms of use to commercial transactions for companies seeking to monetize their data.

Can a data catalog and data exchange co-exist?

Yes! Not only can data catalogs and data exchanges co-exist, but in an enterprise environment, they absolutely should. Data catalogs are useful to manage and govern the entire data estate and support compliance and governance requirements. Data exchanges curate the subset of that data that can be most useful in driving business insights and outcomes from data. A data catalog would include the data products that sit within a data exchange. 

In our conversations with one of the world’s biggest banks, they have indicated that while they’re happy with their data catalog (from a well-known provider), it doesn’t go far enough to make the data in the catalog usable and useful within their business. As a result, they are launching an enterprise data exchange initiative to help them realize more value from data — and faster.

If you’d like to better understand how a data exchange can add value to your business explore the Harbr platform.

1 Gartner, Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders by Ehtisham Zaidi and Guido De Simoni, 12 September 2019