A data exchange is the transfer of data between two parties, typically a data owner and a data consumer. An exchange can involve any type of data and can include data access or taking a copy of the data.
A Data Exchange Platform is software that supports the exchange of data between a supplier/producer and a consumer. You can either operate or participate in a data exchange platform. If you participate in a data exchange you are bound by the rules set by the operator, usually including revenue-sharing arrangements, and you have no control over other participants. If you operate a data exchange you can create a branded experience and set the rules for all participants. You can also access platform-level reporting to understand, manage and optimize the platform.
A data product is a container for one or more digital assets or services to support controlled and scalable transactions between data product owners and data consumers, and the ongoing consumption of the assets/services. Data products can contain any combination of assets in any digital format, can be static or updating, and can be of any size or volume, and services are typically provided via APIs. The aim of a data product is to reduce the time to value and cost of ownership for the data consumer while providing control, auditability, and feedback loops to the data product owner.
Data monetization is generating measurable economic benefits from data assets and services that have typically been ‘productized’ to ensure scalability and reliability.
A Data Cleanroom is a virtual environment where data consumers can access data but cannot remove it. A Data Cleanroom is typically used to provide data access without the risks associated with losing custody of proprietary, confidential or legally restricted data.
A Collaborative Data Cleanroom is a virtual environment where multiple parties can collaboratively work on data but no party can remove it. Typically, collaborative cleanrooms are used to support use cases where the data, models, and expertise from multiple parties are required to achieve an outcome. A Collaborative Data Cleanroom delivers the benefits of data sharing and data collaboration without the risks associated with losing custody exposing proprietary, confidential, or legally restricted information.
A Data Analytics Cleanroom is a virtual environment with specific data processing capabilities that delivers predetermined outputs to data consumers without giving either party access to the underlying data or model. A Data Analytics Cleanroom is typically used to support the delivery of analytical outcomes without the risks associated with providing access to proprietary, confidential or legally restricted data.
3rd-Party Data (or ‘Third-Party Data’) is information provided by a separate, external organization.
A Data Marketplace is an environment for buying and selling third-party data. Data marketplaces facilitate the external exchange of data via financial transactions
A Data Catalog is an organized inventory of the data within an organization built upon extensive metadata. Data catalogs help data owners and consumers understand their internal data landscape including what is available, where it is stored, and who owns it.
Data democratization is a phrase used to articulate a future where data is easier to discover, access, transform and use, reducing the barriers to entry for a wider range of people to be involved in data-related activities and enable better business outcomes.
A Data Lake is a repository that stores and processes large amounts of structured and unstructured data via an ELT (Extract, Load and Transform) process. It provides greater flexibility than a Data Warehouse because it does not require the data to be transformed to a preconceived schema in advance of it being used so the most appropriate transformation can be applied depending on the use case. However, this can lead to scenarios where due to lack of governance and understanding of data it becomes a ‘data swamp’- expensive and fails to deliver value.
A Data Warehouse is a type of data management system that centrally stores and processes large amounts of data that has been converted into a structured, SQLqueryable format via an ETL (Extract, Transform, and Load). A data warehouse provides less flexibility than a Data Lake, because it requires the data to be transformed to a preconceived schema, hence it’s relatively well-understood and ready-to-use, but can be expensive.
Data Mesh, a term coined by Zhamak Dehghani, is a distributed data network that connects local storage and processing units so that data products could be easily accessed and queried where they are, without needing to transport them first. It’s an alternative to the monolithic Data Warehouses and Data Lakes, as well as data virtualization, where virtual copies are made to support remote querying.
In Data Mesh the focus is on creating consumable data products that can be delivered and maintained at source, which may be a lake, or a warehouse, and may also utilize virtualization, but could also be other types of storage and processing technologies.