Here are some questions to think about when determining if you should use predictive analytics and how best to apply it.
Is there a well-defined business objective?
Predictive analytics can be used to solve a wide range of problems, so having a well-defined business objective is an important part of the early conversation. High-level goals are always valuable, but specific and measurable objectives are necessary. Always keep asking, “How can we measure success?”
Can this be solved with automation?
Business problems often have solutions that can be applied with straightforward automated logic. Depending on the complexity of the domain and data, capturing the method of data evaluation as a set of rules is the best first step, eliminating the need to go through the predictive model development process.
Does data exist?
Predictive analytics is driven by data and it is a key factor in successfully finding a solution to the business problem. For instance, a system to identify fraud can use historical examples of fraudulent and non-fraudulent data to refine its predictive power.
What is the quality of the data?
Building models with bad data will invariably lead to poor predictions. In fact, it is common for the initial stages of many predictive analytical projects to evolve into mini data quality projects. Even high-quality data may have issues or biases that are uncovered by sensitive machine learning algorithms.
Are there restrictions on what data can be used?
If there are legal and regulatory restrictions on what data can be used as part of a decision-making process, these same restrictions should be applied as early as possible. If a certain data element, such as gender, is found to be removed from the decision-making process, but a model has been developed already using that data, the predictive model will need to be rebuilt.
What is the volume of data?
Any amount of data is helpful; however, an especially large volume of data is a big advantage when trying to develop predictive models. The process for developing, training, and validating models requires enough training data for predictive models to not behave abnormally when applied to real-world scenarios.
Can we enhance the existing data with additional data?
The raw data being used to develop predictive models may provide enough context to create reliable predictions, but are there additional data sources that can be used to augment existing data? Are there other data sets that can be used, either from an internal or even external source? External commercial data sets may come at a cost, but they provide the potential for extra context that cannot be derived from other sources.
How will we deploy?
Sometimes we can be lulled into a sense of complacency by assuming there will always be substantial computing power available to execute complex computational models. For this reason, first consider the minimum viable solution and then expand your options. For example, a customer had an excellent solution to a problem using Support Vector Machines (SVM) from another consulting firm. There’s nothing inherently wrong with SVMs, but they require a fair degree of computing power to deploy. Unfortunately, this problem often required users to be in the field and possibly without access to significant computing power. After taking a step back and reexamining the issue, we built a decision tree printed on a piece of laminated paper that someone could then use to make decisions without a computer.
Do the predictions need to be explainable?
The recent explosion in machine learning has greatly benefitted predictive analytics. These algorithms push the boundaries of human understanding by creating very complex models with high predictive accuracy. There are times when the resulting predictive model is so complex that there is no way it can be understood, even by opening it up and looking at the internals – these models are often known as “black boxes.” The data goes into the black box and the predictions come out, but with little transparency about how the model determined the prediction. This is unhelpful when the business case requires an understanding of how the model makes its determination.
To account for this, there are two different approaches to consider. First, the problem can be approached by applying simple and straight forward algorithms, such as regressions, and simple decision trees to develop easily understandable models. Unfortunately, this approach has the risk of lower accuracy for the predictions. This strategy needs to be balanced with the business requirements and constraints.
A second approach works with more advanced techniques that create complex black box solutions. Additional work can be done with the advanced models to provide useful insight on how the model is making predictions, which helps the end user to effectively use the predictive algorithm’s results in their work and learn to trust the predictions.
How accurate must the predictions be for success?
Success for a predictive analytics project is largely defined by how accurately the model can make predictions when faced with real-world data. This becomes very difficult because many factors, including data quality, data volume, domain knowledge, algorithms and parameters, model complexity, training, and the testing processes all come together to affect the model’s accuracy. Any one of these elements can have a major impact on predictive accuracy. For instance, one consideration in model development could be, “Are you willing to give up the ability to ‘understand’ the model for greater accuracy?” This can ultimately drive the accuracy of a model and must be balanced with the business requirements.
As part of a well-defined objective for the predictive analytics project, “how” to measure accuracy needs to be defined. Once the “how” can be described, an initial benchmark helps to provide context to the progress being made over the course of a project. For instance, a fraud detection project determines accuracy to be measured by the number of actual fraud cases found in a list of the top 1,000 cases identified by the predictive model. A benchmark could be as simple as using the average known fraud percentage of 10%, or 100 cases in the top 1,000 cases. A useful predictive model would have to find more than 100 cases – the benchmark – to be of any value.
There are numerous questions to consider when evaluating predictive analytics as a potential solution to a business problem. By answering them with the business requirements in mind, predictive analytics can be used to provide accurate predictions and help solve real-world business challenges.