Will data scientists soon lose their jobs to super intelligent computers?

With artificial intelligence and machine learning methods developing at such a fast pace those who argue that computers will soon replace what is now performed by human data analysts start to sound more and more serious. Leading companies (Google, IBM) and universities (MIT, University of Berkeley) have their most intelligent computer scientists working on systems that automate the process of finding valuable insights in data. One of the tools that attempts to “replace” data analysts is IBM Watson Analytics. “Just upload the data, ask a question as you would ask it to your colleague and we will find an answer” is what Watson promises to be able to do. It interprets a users question in their natural language and then comes up with automated visualisations that it identifies to be most relevant to the enquiry. What’s more, based on the question that the user asks, it shows recommendations of further insights.

It sounds great, but how does it work in practice? Will IBM Watson Analytics and similar tools be able to replace human data analysts? The answer is no – at least not soon. I have spent a month exploring the value that Watson could bring to Cambridge Data by comparing the insights it found with the insights that human data analysts came up with. It turns out that while Watson is in fact easy and intuitive to use, it cannot handle complex questions that require more detailed prediction and pattern recognition like “How should I set my price to increase my revenue?”. In practice, these are the questions that lead to important business decisions and thus Watson’s ability to deliver real value to the business completely on its own is very limited.

However, that does not mean Watson is useless. Even though Dr. Watson could never match Sherlock’s deductive skills and was not able to solve cases on his own, he still proved to be very helpful in supporting the main hero in his analysis. The same is true with IBM Watson. It cannot answer complex questions itself, but it can be very useful as an “assistant” of the data analyst. Watson makes it quick and easy for the user to visualise relationships between variables in the data. It therefore gives the user the freedom to test any hypothesis about the correlation between variables that he comes up with, even if there is a risk that the hypothesis won’t bring meaningful results. This allows the data scientist to test the value of his hypothesis before diving deeper into exploring it using more advanced techniques, which may require time and expensive resources. Brainstorming the data in such a way cannot only save time, but also generate some fresh, out-of-the box ideas. During a traditional brainstorming session participants are encouraged to share ideas that come to mind and not be judged on them. This results in a higher likelihood of discovering innovative solutions. Similarly, the simplicity and freedom of testing a hypothesis provided by Watson increases the chance of coming up with innovative questions to ask the data.

I experienced how helpful Watson can be as an assistant while analysing the features and behaviour patterns that differentiate frequent and loyal buyers from people who come to the shop once or twice and never come back. Looking at different variables characterising the orders that customers made from the shop many hypotheses arose. Do frequent buyers usually spend more? Do they come back quickly? Or maybe their behaviour varies depending on how long they have been customers? Watson enabled me to quickly test these hypotheses and discover which of them find support in the vast amount of data. It made it easier to visualise brand choices, time intervals between orders, bundles of products and the amount of money spent for both frequent and non-frequent buyers and focus more on using my human intuition to compare the visualisations and find interesting business insights.