Data catalog vs. data marketplace: What’s the difference?

Anthony Cosgrove
Co-founder, Harbr

What’s the difference between a data catalog and a data marketplace?

Repeat after me: A data catalog is not a data marketplace. A data marketplace is not a data catalog.

The distinction may be obvious when comparing a data catalog and a public data marketplace (a place that enables buying and selling of data). But the difference is less clear when comparing a data catalog and an enterprise data marketplace.

Before I explain the difference in features and users that they serve, let’s first look at how they differ in purpose.

Purpose of a data catalog vs. data marketplace

The purpose of a data catalog is to understand and organize all the data within an organization. A catalog is primarily used by data governance teams to establish an inventory and provide governance over the data estate. This is particularly important for complying with standards, regulations, and laws. It also helps manage the risk of sensitive or valuable data elements.

The purpose of a data marketplace is to facilitate the access, use, and distribution of ‘ready-to-use’ data (including data products) within and between organizations. A data marketplace is primarily used to reduce time to access and time to value for a wide range of data users. This is particularly important when building applications, delivering insights, and creating data-driven business processes. 

Features of a data catalog

The typical features found in a data catalog include:

  1. Metadata management: This should include a ‘data asset inventory’ listing all data assets and a ‘metadata repository’ that stores information about each data asset.
  2. Data discovery: The ability for users to search and browse data assets using keywords, filters, tags, data type, source, and owner.
  3. Data governance, which can include:
  • Data lineage tools to represent data flows from source to target and how data is transformed
  • Data owner/stewardship information with the responsibilities of individuals in relation to specific data assets
  • Tools to ensure data usage is managed in line with standards, regulations and laws
  1. Data quality: The automatic generation of detailed metadata to provide insight and assurance on data quality, including the accuracy, completeness, consistency, and timeliness of data.
  2. Collaboration and documentation: Users can add comments, notes, and documentation to data assets in order to foster collaboration and knowledge sharing. Social features and user-generated content (UGC) — like ratings and reviews — capture user feedback and can improve the overall experience.
  3. Usage analytics: Usage metrics provide insights into how often data assets are accessed, by whom, and for what purpose. Data access controls allow organizations to manage who can view or edit data assets.
  4. AI: Some catalogs have developed more advanced features using AI and machine learning to improve data asset recommendations, metadata generation and analysis, and the populating of business glossaries.

Features of a data marketplace

The typical features found in a data marketplace include:

  1. Data product management: Users should be able to create and manage data as a product, by connecting to data sources, defining assets, and combining any number and type of asset into a product by adding packaging and permissioning.
  2. Data product catalog/storefront: Similar capabilities to a data catalog, but for data products rather than just data assets. Data products will have dedicated summaries with rich packaging explaining and demonstrating use cases to enable assessment and comparison.
  3. Permissions: Users can gain subscription-based access to data products through pre-provisioned, self-service, or access request workflows. Subscription plans are based on policies that control data access including duration, permitted use, data contracts, pricing, single/multi-user, etc.
  4. Transformation and customization: Ability to transform and customize data with both technical and low/no-code options — such as executing code/notebooks, or filtering rows and columns. Access and use of customized data products subject to permission lineage so the data owner maintains visibility and control.
  5. Analytics: Ability to analyze data with technical and low/no-code options to support a wide range of users and use cases. This may include sandboxes, clean rooms, data science workbenches, query engines, natural language queries, data apps, and interactive visualizations. Tooling, infrastructure, and data is orchestrated by the marketplace for a self-service experience.
  6. Distribution: Ability to distribute data via a range of mechanisms including APIs, cloud pipelines, SFTP, and desktop download. Distribution is one-off or automated on an ongoing basis — either scheduled or event-driven, such as when the data changes.

Why do I need a data marketplace when I already have a data catalog?

If your organization has already invested in an enterprise data catalog, isn’t that supposed to support one-stop shopping for data?

According to industry expert Wayne Eckerson, the answer is “yes and no”. As he explains in this detailed guide, “a data catalog gets users halfway to data, while a data marketplace closes the loop.”

Personas for data catalogs

A data catalog serves a range of user personas:

Data Stewards: Use the catalog to monitor data quality, ensure compliance with governance policies, and manage metadata. Their aim is to maintain high data quality and adhere to standards.

Data Scientists, Data Analysts, Business Analysts: Use the catalog to discover relevant datasets and understand their context and meaning. Their aim is to find the best data to achieve their objectives.

Data Engineers: Use the catalog to manage data sources and pipelines, ensuring efficient ingestion, processing and storage. Their aim is to maintain a reliable flow of data across the organization.

Chief Data Officers (CDOs): Use the catalog to gain insights into usage, monitor governance, and align data with business objectives. Their aim is to oversee the organization’s data strategy and communicate the value of data assets to stakeholders.

Compliance Officers: Use the catalog to access audit logs, monitor data access and usage, and ensure data handling meets regulatory standards. Their aim is to ensure the organization adheres to legal and regulatory requirements and avoids legal risks.

Personas for data marketplaces

A data marketplace has three key user personas:

Chief Data Officer (CDOs), Heads of Data Marketplace: Use the marketplace to manage data access and usage across their entire data ecosystem. Their aim is to reduce time to access and time to value, by increasing governance, self-service and automation.

Data Product Managers, Data Owners: Use the marketplace to connect to data storage, define data assets and configure and manage data products. Their aim is to increase the use of their data assets and drive more business value, while maintaining or improving risk management.

Data Scientists, Data Analysts, Business Analysts: Use the marketplace to discover, access, transform, analyze and distribute data and data products. Their aim is to access and use the best data to achieve their objectives as quickly as possible.

Do I need a data catalog or a data marketplace?

The differences between data catalogs and data marketplaces start with their profoundly different purposes, and continue on to the features they have and the user personas they serve.

The catalog is focused on profiling all of the data within an organization to support governance and reduce risk. The marketplace is focused on enabling governed access, use, and distribution of data products to data users, regardless of technical skill.

Given the differences, most organizations will need both a catalog and a marketplace, and these two systems should be tightly integrated.

At Harbr, the data marketplaces we’ve deployed for our customers integrate with their data catalogs, pushing and pulling metadata to keep both systems doing what they do best. This allows organizations to take advantage of the governance that’s been established by the catalog, while accelerating time to access and time to value in a fit-for-purpose platform.

Deploy your own data marketplace

If you’re ready to explore how a data marketplace can complement your data catalog and accelerate data access and value in your organization, get in touch. Harbr data marketplaces are in place at some of the biggest data-driven organizations in the world, and can be deployed in a matter of just weeks.