What’s the difference between a data catalog and a data marketplace?
Repeat after me: A data catalog is not a data marketplace. A data marketplace is not a data catalog.
The distinction may be obvious when comparing a data catalog and a public data marketplace (a place that enables buying and selling of data). But the difference is less clear when comparing a data catalog and an enterprise data marketplace.
Before I explain the difference in features and users that they serve, let’s first look at how they differ in purpose.
The purpose of a data catalog is to understand and organize all the data within an organization. A catalog is primarily used by data governance teams to establish an inventory and provide governance over the data estate. This is particularly important for complying with standards, regulations, and laws. It also helps manage the risk of sensitive or valuable data elements.
The purpose of a data marketplace is to facilitate the access, use, and distribution of ‘ready-to-use’ data (including data products) within and between organizations. A data marketplace is primarily used to reduce time to access and time to value for a wide range of data users. This is particularly important when building applications, delivering insights, and creating data-driven business processes.
The typical features found in a data catalog include:
The typical features found in a data marketplace include:
If your organization has already invested in an enterprise data catalog, isn’t that supposed to support one-stop shopping for data?
According to industry expert Wayne Eckerson, the answer is “yes and no”. As he explains in this detailed guide, “a data catalog gets users halfway to data, while a data marketplace closes the loop.”
A data catalog serves a range of user personas:
Data Stewards: Use the catalog to monitor data quality, ensure compliance with governance policies, and manage metadata. Their aim is to maintain high data quality and adhere to standards.
Data Scientists, Data Analysts, Business Analysts: Use the catalog to discover relevant datasets and understand their context and meaning. Their aim is to find the best data to achieve their objectives.
Data Engineers: Use the catalog to manage data sources and pipelines, ensuring efficient ingestion, processing and storage. Their aim is to maintain a reliable flow of data across the organization.
Chief Data Officers (CDOs): Use the catalog to gain insights into usage, monitor governance, and align data with business objectives. Their aim is to oversee the organization’s data strategy and communicate the value of data assets to stakeholders.
Compliance Officers: Use the catalog to access audit logs, monitor data access and usage, and ensure data handling meets regulatory standards. Their aim is to ensure the organization adheres to legal and regulatory requirements and avoids legal risks.
A data marketplace has three key user personas:
Chief Data Officer (CDOs), Heads of Data Marketplace: Use the marketplace to manage data access and usage across their entire data ecosystem. Their aim is to reduce time to access and time to value, by increasing governance, self-service and automation.
Data Product Managers, Data Owners: Use the marketplace to connect to data storage, define data assets and configure and manage data products. Their aim is to increase the use of their data assets and drive more business value, while maintaining or improving risk management.
Data Scientists, Data Analysts, Business Analysts: Use the marketplace to discover, access, transform, analyze and distribute data and data products. Their aim is to access and use the best data to achieve their objectives as quickly as possible.
The differences between data catalogs and data marketplaces start with their profoundly different purposes, and continue on to the features they have and the user personas they serve.
The catalog is focused on profiling all of the data within an organization to support governance and reduce risk. The marketplace is focused on enabling governed access, use, and distribution of data products to data users, regardless of technical skill.
Given the differences, most organizations will need both a catalog and a marketplace, and these two systems should be tightly integrated.
At Harbr, the data marketplaces we’ve deployed for our customers integrate with their data catalogs, pushing and pulling metadata to keep both systems doing what they do best. This allows organizations to take advantage of the governance that’s been established by the catalog, while accelerating time to access and time to value in a fit-for-purpose platform.
If you’re ready to explore how a data marketplace can complement your data catalog and accelerate data access and value in your organization, get in touch. Harbr data marketplaces are in place at some of the biggest data-driven organizations in the world, and can be deployed in a matter of just weeks.