A Brief History

Once upon a time there were a great many data marketplace companies, large and small. Sadly, almost all of them failed. They all had the excellent idea of making data visible and available for companies that need to sell and distribute their data, by facilitating the collection and marketing of said data. What an amazing concept! A replacement for failed data lakes! The hype was extreme:

Data Marketplaces: The Holy Grail of our Information Age

hackernoon.com July 19, 2018

So, enter 2020 and a reality check on the success of these much vaunted data marketplaces. Where are the players from the early ’10s nowadays?

  • Data Market was acquired by Qlik and now just mainly curates data
  • Microsoft’s Azure Data Marketplace closed in 2018
  • Kasabi started in 2010, shut down in 2012
  • BuzzData started in 2010 with a model to socialize data. Quietly closed shop on August 1, 2013
  • Infochimps started out as a data marketplace in 2009, and was acquired by CSC in 2013 and no longer offers a marketplace
  • Timetric started out as a data market in 2008 focusing on statistical and time series data, and was acquired by GlobalData in 2018, which is a consulting company and not a marketplace
  • Founded in 2008, social data marketplace Gnip was acquired by Twitter in 2014
  • DataStreamX marketplace ceased operations on March 2019 and is now part of Quadrant
  • Freebase, launched in 2007 by MetaWeb Technologies, died in 2016 after being purchased by Google
  • xDayta launched in 2013, closed down in 2015

Sources: Data marketplaces, we hardly knew ye, Datafloq, Crunchbase

What Happened?

The obvious answer seems to be there simply wasn’t a market for data marketplaces, but the real answer is far more nuanced. Generally speaking, many marketplace business models fail, but there tends to be at least one significant, successful marketplace for a given commodity. However, that’s not the case with data…so what gives?

Data is Different

Data is a commodity with many unique characteristics including:

  • It is adaptable to the point where there is zero-recognition or traceability back to the original form. For example, a user could take a structured table, cleanse it, join it with another data set, execute multiple calculations and end up with an unstructured PDF report or visualization that bears little or no resemblance to the original sources. If you start with unstructured data the traceability is even less.
  • It is highly substitutable. For example, there may be many options available for sourcing company data and while there will be differences in coverage, timeliness and general reliability the data itself is substitutable.
  • It is exceptionally diverse. Not only is the specific data diverse, but so are the formats, delivery mechanisms, frequency of update, terms and pricing. This makes it highly complex and difficult for people to engage with.
  • It is often ‘raw’. The data that is available is rarely ready to support all of the diverse use cases for which it may be applicable meaning the consumer spends significant resources adapting it to a specific use case.
  • It is digital and easy to copy. This makes data easy to steal and coupled with a lack of traceability means it’s risky

None of these characteristics lend themselves well to a marketplace paradigm, which relies on heavy process standardization, ease of comparison, arms length interactions between buyers and suppliers and clear-cut value transfer. But, what is perhaps most interesting is that managing these characteristics means generating very high levels of trust, which is ironic given that trust is understood to be critical for any successful marketplace. 

Trust is Critical

All marketplaces tend to drive a value proposition by optimizing for discovery and fulfilment. Amazon is a great example — find everything you need, one-click checkout and same-day delivery — but trust is an issue they also need to deal with. Trust is typically about the quality of the products and trust that it will arrive. Some marketplaces (e.g. Ebay) also trust that the consumer will make payment post-fulfillment.

In data marketplaces, trust is significantly more important and challenging to resolve. Users need to trust that:

  • The data is high quality and dependable
  • The supply will be consistent and not break processes
  • The data will deliver value once it has started to be used
  • The consumer will not steal the data (or have it stolen from them)
  • The consumer will not use the data for non-permitted use cases

…that’s a big ask! As a result, many highly inefficient norms have been established over time to deal with the lack of trust.

Sample-based Trials

A trade-off between proving value and losing ownership of an entire product. This creates significant downstream costs for data consumers to compare samples (e.g. storage and compute resources and developer time spent homogenising data models and making multiple joins with proprietary data). This also leaves suppliers unable to compete or provide hands-on support as to how to best use their product.

Non-standard Pricing

Data is notoriously difficult to price; the value varies dramatically between vertical, jurisdiction, customer and use case AND there is imperfect knowledge on both sides click to tweet. This ultimately leads to protracted commercial discussions, potentially with multiple parties simultaneously.

Bespoke Legal Documents

Extensive legal negotiations and agreements are often required to protect against the risk for both parties due to the supplier losing ownership of their product and the potential for loss, theft, misuse, etc. The risk appetite of organizations varies dramatically but also in line with the perceived value of the product they are trying to sell or buy.

Long Commitments

Due to the slow, expensive and risky processes outlined above, it is typical for data licenses to be for years. This run generates yet more risk for the consumer, as the cost of failure is high. Market data teams try to simplify this by championing specific data. But in the absence of understanding the specific use case or the business teams being able to properly quantify or articulate the difference in value between alternatives, this is not a solution that works.

Are data marketplaces destined to fail?

Historically, data marketplaces have tended to fail and did not find a mechanism for sufficiently addressing the critical issues of trust. Instead they opted for a standard marketplace paradigm with little adaptation to account for the unique characteristics of data. A standard marketplace is simply not the right model for supporting transactions between data buyers and suppliers click to tweet. Additionally, any marketplace has to manage issues inherent to the marketplace model, including disintermediation, commoditization, governance of negative behaviour and the cold-start problem, all of which are arguably far more significant when the commodity is data.

That said, there are aspects of the data marketplace model that bring obvious benefits to anyone that has ever worked with data. Discoverability and fulfilment are important and data marketplaces addressed those specific issues very well. Data is often curated, comparable and sometimes even standardized, which is extremely helpful. Consumers are empowered to get the data on-demand, rather than it being entirely left with the supplier. These are important aspects of what good would look like in a successful future-state. In the right market and at the right time, a data marketplace can be useful. Quandl rode the alternative data wave, launching in 2013 and getting acquired by Nasdaq in 2018 for an undisclosed amount and continues trading to this day.

Ultimately, data marketplaces are likely to continue failing because they do not sufficiently optimize the concept to address the specific characteristics of data or, put another way, they solve the wrong problems click to tweet. Historically, they did not focus enough development time on creating mechanisms to iteratively build trust and the ability to interact in low-trust situations. That requires significant effort to overcome but, done well, it also helps to manage other undesirable issues such as disintermediation and commoditization, which are problematic when working with something as complex and nuanced as data.

Unfortunately, the new wave of data marketplaces continues to use the exact same paradigm but try to avoid the cold-start problem by leveraging an existing user base. Avoiding a cold start problem is good, but this introduces a new complexity. This new wave is using the data marketplace paradigm to increase the usage and stickiness of their existing cloud platforms, so they limit where the data can reside. This restricts the addressable market and creates complexity for suppliers, who need to manage their offerings across multiple data marketplaces. The value proposition for data suppliers is unclear given many have their own solutions to enable discovery and fulfilment – the main benefits of a data marketplace – and without being sat alongside their direct competitors.

Time will tell if the data marketplace paradigm can ever succeed, but the historical data is not in it’s favour.

Co-authored by Anthony Cosgrove (Co-Founder) & John Kuo (Head of Product) at Harbr