Separating Hype From Value In Artificial Intelligence
You've probably heard a lot about data science, artificial intelligence, and big data. Frankly, there has been a lot of hype surrounding these areas. What it has done is inflate expectations about what data science and data can actually accomplish. In general, this has been negative for the field of data science and for big data. It helps to give some thought of separating the hype of data science from the reality of data science.
The first question is always "What is the question you are trying to answer with the data?" If someone comes to talk to you about a big data project, artificial intelligence or a data science project, and they start talking about the newest technology that they can use to do distributed computing, and analyze data with machine learning and they use a lot of buzzwords, the first question you should ask is "What is the question you are trying to answer with the data?" Because that really narrows the question and filters out a lot of hype around the tools and technologies that people are using, which can often be very interesting and fun to talk about. We like to talk about them too, but they aren't really going to add value to your organization on their own.
The second question to ask yourself, once you have identified the question you are trying to answer with the data, is: "Do you have the data to actually answer that question?" So often the question you want to answer and the data you have to answer with are not really very compatible with each other. So you have to ask yourself "Can we get the data so that we can answer the question we want to answer?" Sometimes the answer is simply no, in which case you have to quit (for now). Bottom line, if you want to decide whether a project is hype or reality, you have to decide whether the data that people are trying to use is really relevant to the question they are trying to answer.
The third thing to ask yourself is, "If you could answer the question with the data you have, could you even use the answer in a meaningful way?" This question goes back to that Netflix competition idea where there was a solution to the problem of predicting which videos people would like to watch. And it was a very, very good solution, but it was not a solution that could be implemented with the computing resources that Netflix had in a way that was financially convenient. Even though they could answer the question, even though they had the correct data, even though they were answering a specific question, they couldn't actually apply the results of what they found out.
If you ask yourself these three questions, you can very quickly decipher whether a data science project is all about hype or if it is a real contribution that can really move your organization forward.
How to determine the success of a data science project?
Small businesses seldom use cutting edge technology simply because it is not within their budgets, knowledge, or resources. However, almost all of them are called to experiment with this technology, because if they do not do it, someone else will do it and finally whoever does it will gain in competitiveness, costs or utility.
Defining the success of an Artificial Intelligence project (which is technically called data science or machine learning) is a crucial part of managing a data science experiment.
Of course, success is usually context specific. However, some aspects of success are general enough to warrant discussion. Our list of badges of success includes:
1. The creation of new knowledge.
2. Decisions or policies are made based on the outcome of the experiment.
3. A report, presentation or app with impact is created.
4. It is learned that the data cannot answer the question that is being asked.
Some more negative results are: that decisions are made that ignore the clear evidence of the data, that the results are equivocal and do not shed light in one direction or another, that uncertainty prevents the creation of new knowledge.
Let's talk about some of the positive results first.
The new knowledge seems ideal to us. However, new knowledge does not necessarily mean that it is important, that it produces new decisions or policies.
If it produces applicable decisions or policies, great (wouldn't it be great if there was evidence-based politics, like the evidence-based medicine movement that has transformed medicine?). For our data science products to have a big (positive) impact would, of course, be ideal. Creating reusable code or applications is a great way to increase the impact of a project.
Finally, the last point is perhaps the most controversial.
In some cases we believe that a data science project is successful if we can show that the data cannot answer the questions that are being asked. We remember a colleague who told a story about the company where he worked. They hired many expensive data science consultants to help use their data to inform pricing. However, the prediction results were not helpful. They were able to verify that the data could not respond to the hypothesis studied. There was too much noise and the measurements weren't accurately measuring what was needed. Sure, the result was not optimal, as they still needed to know how to price things, but it did save money on consultants. Since then, we have heard this story repeated almost identically by friends from different organizations.