Data Assessment: Getting to Know You…

As set out in Strategies for Unlocking the Value of Third-Party Data, organizations are faced with both a need and an opportunity to revisit how they acquire and extract value from third-party data. In Solving the 80/20 Data Dilemma, six phases associated with acquiring and extracting value from data were identified: Discover, Assess, Acquire, Adapt, Use, Monetize (optional).

While Discovery is critical to get right because everything leads from it, the process of assessing even a small number of different data products can be extremely time-consuming and lead to flawed decision-making. In an ideal world, you would have two near-identical products with all the metadata necessary to easily compare them and understand their differences. You would also have a clear and consistent understanding of the nuances of the data, such as how certain fields had been calculated, and instant access to the tools and infrastructure necessary to run your analysis. In the real world that rarely happens…

A common situation is to have multiple potential data products from a range of providers. Each will provide a sample that showcases their data, but rarely the whole corpus. The samples will not necessarily overlap, will likely have different schemas and potentially different formats. They will be sent via a range of mechanisms including upload/download, email, API, and cloud bucket syncs. None of the data products are likely to be ready to combine with your proprietary data.

You wanted data and you ended up with a large, time-consuming data engineering/science project just to work out what data is best for your needs. If you get this wrong it has potentially serious consequences, but the cards are stacked against you. This appears to be the most intractable part of acquiring third-party data, but it doesn’t have to be (and by the way,  the data vendors are as frustrated as you are!). Here’s how to get it right:

  1. Rinse and repeat. Data acquisition is not an ad hoc task. Any work done here – and work will always need to be done here – should be recyclable. When acquiring samples ensure you can maintain access to it for future needs. Document your findings, so they can be reused. Share the code that converted the sample into something that could be joined to your data or share the output that was created. Make sure that no one has to do the same work again unless there is a specific need.
  2. Ask for more. In the same way, that sharing tables doesn’t deliver value providing a sample of a table creates overhead for the consumer. Ask for a data dictionary, structural and content metadata, details of historic changes/versions, the typical volume of deltas, and anything else that might be relevant to your use case. The more information you can capture, the better the decision.
  3. Find the middle ground. A collaborative data exchange provides a neutral territory where you can work collaboratively on data with your third-party suppliers. This means they can share the full corpus and all the supporting assets while maintaining ownership and visibility, avoiding the problems associated with samples. You can also work together on the data while you’re trying to assess it. They know the data, you know the use case, so working together accelerates progress and creates better outcomes.

Assessing data is one of the most challenging steps and seems overwhelming due to the number of parties involved, the complexity of the task, and the impact of the decision. This area potentially requires the most focus for data-driven enterprises if they are to significantly enhance how they interact with third-party data. Once addressed, the focus can shift to the process of acquiring it, which can be a complex dance between Sales and Procurement with Legal and IT in supporting roles.

Authored by Anthony Cosgrove (Co-Founder) at Harbr