In today’s world, machine learning has become ubiquitous, solving complex daily problems by recommending media, estimating wait times and even translating languages. It is undoubtedly a powerful tool, providing exciting experiences and saving business and consumers time and money.
Ekata uses machine learning to build innovative product offerings such as the Identity Risk Score—and we have learned some important lessons along the way:
Understand the problem you want to solve with machine learning
While this might sound obvious, taking the time to really understand and articulate the problem and the surrounding context is a critical first step. Below are some important questions that should be understood:
- What is the business problem that your customers are facing?
- What options are currently available to solve the problem?
- How accurate does your solution need to be?
- What is the value of solving the problem?
- What is the cost of a wrong answer?
Investing time on this front will pay dividends when you have to make decisions such as model selection, feature engineering, and training data requirements. Articulating a strong customer need and business problem orientation provides a useful framework to evaluate such choices and make the right tradeoffs. For example, at Ekata, we spend hours understanding fraud patterns, rules that are used to catch chargebacks and reduce reviews, and the economics of such workflows. This allows us to create a product that not only reliably flags fraudulent transactions, but saves our customers time and money in implementation.
Ensure you have quality data feeding your models
A machine learning model is only as good as the data that it ingests—garbage in is truly garbage out. Poor data runs the risk of confusing your model and optimizing it toward the wrong objectives. Furthermore, poor data complicates performance analysis because you’ll be left wondering if the root cause of an issue is the model or the underlying data.
To address this, provide documentation and setup processes that outline and check data definitions and standards. This is particularly useful if multiple internal teams and/or customers are part of the data generation pipeline.
Machine learning is fast; analysis is slow
Modern day machine learning models are amazing in that they can process millions of transactions with hundreds of variables quickly. However, it still takes time to truly understand the variables and the results, and how they relate to the real world behavior that the model represents. Such work is relatively slow and painstaking, but necessary.
The goal should be to develop an intuitive feel for the model. Do the results make sense? Are they in line with general trends and beliefs in this industry? I’ve found it useful at this juncture to get a gut check from team members who are not in the weeds of building and optimizing the model.
Employ both statistical and business metrics to measure success
As you develop and iterate the machine learning solution, you will need to decide at what point it is good enough to launch. This question is often complicated not only because there is always room for improvement and optimization, but also because there are an extensive number of metrics that could be used to judge performance.
See our machine learning applied in our Identity Risk Score.