Q&A With Sebastian Drave, Chief Data Scientist at Harbr

What are the biggest challenges data scientists are facing?

There are a few challenges. First, there’s always a bit of a technological challenge around being a data scientist. This is not necessarily a challenge of proving capability, but being able to build, prove, deploy and distribute capabilities within an enterprise setting. I think that’s probably one of the biggest transitions that any data scientist has to go through early in their career. They start off working in research-based environments, which are quite individual and singular. You have control of everything and you can use whatever combination of tools and capabilities you choose. They then transition into a much more rigorously maintained and managed enterprise set-up where your choices are going to be more limited, and there are operational workflows and processes that you have to work through. Ultimately, they can’t drive fundamental change on their own anymore in a way that they could when they were research scientists working in their specialty.

So I think that’s a challenge: having the organizational empowerment to be able to take ideas and concepts, apply data science approaches and thinking to them, and also realize the value of that — not once, but multiple times for multiple different consumers.

Secondly, one of the challenges that dovetails with the technological challenges is how well an organization has integrated their data science capability into their business units (or not). In many cases, organizations will have stood up data science functions that sit outside the business units and it’s supposed to be cross-functional and multidisciplinary, but the result of that is often that they don’t have the contextual connections within the organization to provide their services in a way where the organization can then really benefit from them.

How does a company get the most value from its data scientists?

A big part of that is understanding before you bring the data scientists in, or if you already have them, how to evolve, how to position their placement inside the organization, and how to put them alongside team members to build connections. These connections will empower them to do the data science, but also in a technical sense, give them a route to operationalize or distribute the outputs of that content to the right people.

It doesn’t matter how smart, unique and capable the analytical process may be if you’re deploying people into a scenario where you still need to download it into an Excel file and email it to someone. Then, you’re fundamentally limiting the capability of any data scientists, regardless of how technically proficient they may be.

It’s really about that combination of the organizational structure and the correct technology empowerment to allow data scientists to distribute the content, the results and the value that they’re creating — and do that in a very demonstrable and very business stakeholder-focused way.

How would you describe the data science culture and why it is so important?

Culturally, I think there are actually several strands of data scientists, and the nature of the people who sit within the sessions varies. I think the key thing that underpins all of it is the research and development element. That’s not to say people are working in R&D roles, but that you have that kind of culture, that DNA, and that experience coming into your data scientists. That is what’s going to really help them and the organization make strides in what they’re trying to achieve and is crucial when you’re bringing that capability into a business unit. Now, that will manifest itself in very different ways and you will actually have specialties and skillsets within the broader data science umbrella as a result of that.

If I were to look at the two furthest ends of the scale, let’s start with a data scientist who is embedded in an AI or R&D function of either a tech company or leading technical university. They may be an absolute specialist in the internal workings of a particular model or approach and be working right on the bleeding edge of that piece of technology. They still have that R&D mindset and are working to a goal, but it’s incredibly focused and it’s normally quite technology driven in terms of its outcome.

Now, if you take it all the way to the other end of the scale, you have data scientists who are embedded in business units in very large complex organizations. They have a high turnover of requests, projects and outputs and have to very quickly understand, contextualize and then work against these complex inputs and structures. Also, they have to understand the toolkit available to them and where to deploy each type of capability. That’s a very different role, often occupied by very different types of people and skillsets. But it still sits under that same umbrella and it still has that R&D culture embedded in it.

How would you recommend integrating data scientists in your operations and processes?

When you do hit a large enterprise, you start to realize very quickly that you cannot affect change on your own. You have to be part of a wider team working towards those goals. So culturally, that’s how you have to embed them. And there may be pure relationships there — for instance, data scientists who work with particular teams and that’s all they work with. Or it may be an embedded capability with a dotted line out to a more centralized data science function that facilitates group and enterprise-wide knowledge sharing and best practices. It does require that integration for context, rather than sitting on the side and being disintermediated from the business goals.

What do you see as the future for data and data scientists?

I think the future for data and data scientists has to go away from data scientists being able to add value simply because they are the most skilled data people within a team. Thus, they seem to be much more technically proficient simply because they’re doing the zero- and first-order problems with data very well.

A lot of organizations that build capabilities will have designs to go for the big outputs that you can get from data science — AI models, machine learning capability, that sort of thing. There are a lot of preparatory steps that happen before that. I think where data scientists can get lost is in those first couple of steps: starting to build capabilities that really should be automated, and then started to build new capabilities on top. Ultimately, through the day-to-day pressures and overhead, you get stuck in steps one or two. Suddenly, you’ve got something the business has proved to be valuable, but you don’t have a mechanism to productize it — even though the business now can’t live without it.

So you end up as a data science function taking on the maintenance and management of that capability, which should be passed off in as effective and efficient way possible. I think the future is really about starting to operate inside frameworks and structures that allow data scientists to much more rapidly — across many, many organizations — get further up the value chain and deliver higher-value products and outputs into the business. People often describe this as getting away from spending 80% of their time just getting the first version of the data — let alone the sort of second, third, fourth version, which is really optimized for their use cases.

About Seb Drave

Seb began as a career academic within the high energy astrophysics area. During that time, he had a natural interest in data analytics, data exploration and data engineering. His transition from academia to the corporate world coincided with the emergence of data science, so he joined HSBC as a data scientist. Later, he reunited with his former HSBC colleague, Anthony Cosgrove, at Harbr, where he’s been instrumental in designing and developing the company’s collaborative data exchange platform.