Using Machine Learning for Rule Building

Many of our customers use rules in their fraud workflows to identify good and bad actors. These rules, which are combinations of conditional statements that operate as data filters, can be very effective, but formulating “the right rule” can be difficult. That’s where the Ekata Field Data Science (FDS) team comes in to help customers formulate “the right rule” quickly by using machine learning.

The FDS team leverages decision tree-based machine learning models to build rule candidates. Decision trees are a type of algorithm that aims to unmix the target classes in a dataset through a series of conditional statements that splits or divides the data into different populations.

In the example figure below, the tree’s decisioning path seeks to isolate good (green) from bad (red) transactions. These conditional split points, also known as decision nodes, make decision trees highly interpretable models. And because you can extract the decisioning path logic from the models, decision trees make for a great way to build rules.

Machine learning decision tree

Figure 1: Example decision tree logic to unmix good from bad transactions. Each node is a conditional split that leads to another until we reach an end node with the final good to bad ratio achieved.

Rule Modeling Framework

Training models for rule building differ slightly from the normal data science modeling framework, as the overall model’s performance is not our primary focus. In this case, we care most about collecting all the conditional statements the model has used in all decisioning paths of the tree and evaluating each path as a rule candidate. Generating a single high-performing model is also not the end goal so we can create many models to build more diverse rule candidates.

We can build and extract logic from various models using different tree-based algorithms and algorithm settings, known as hyperparameters. The Random Forest is one such machine learning algorithm that has proven to be useful for rule building. This algorithm builds and aggregates the results of many different individual trees using different subsets of the training data. While a Random Forest model is not very interpretable due to aggregation, it perfectly suits the rule-building process since we don’t care about the final aggregated result. We can similarly extract the decisioning logic from every tree in the Forest to build rules. These rules tend to be more diverse because each tree in a Random Forest is built with a different subsample of the whole training dataset and exposed to different features at each decision node.

Machine learning random forest

Figure 2: In a Random Forest model, each tree uses a different subset of transactions (rows) and is exposed to a different set of features (columns) at each decision node. Each colored cell represents the feature-transaction combination used in each decision node.

Rule Evaluation

Once we have extracted the decisioning logic from our models, we must evaluate each rule candidate. This evaluation puts the rule back into the context of the customer’s workflow:

  1. How many transactions are caught by the rule?
  2. How many of these caught transactions are good?
  3. How many of these caught transactions are bad?
  4. What total chargeback amount would have been prevented with the rule in place?

During the evaluation, we look at common classification model performance metrics (including precision, recall and F1 score) and applicable business metrics to see how those translate into an overall ROI for the customer with Ekata products in place.

Machine learning rule evaluation

Figure 3: Each candidate rule (i.e., tree decisioning path) is evaluated in the context of the customer workflow and known outcomes.

Rule Selection

With the rule candidates and performance results collected, we examine the candidates and select the rule(s) that best meet the customer’s goals. The FDS team can recommend rules that meet the customer’s goals and make sense in the context of the fraudulent behavior they seek to differentiate.

Machine learning rule example

Figure 4: Ekata-derived rule that the FDS team might recommend to a customer to meet their goal. In this case, the rule is a three-feature conditional statement to isolate good actors.

Rule recommendations are just one component of the analysis the FDS team will present to a prospective customer as a starting point for further customer-driven analysis. Typically, our customers find the best results when they add Ekata products to supplement their internal data and use Ekata’s recommendations as a strong baseline.

Author

Caitlin Streamer

Principal Field Data Scientist, Seattle

Caitlin has a diverse background in technology consulting, most recently as a data scientist in the natural language processing space. She approaches everything with a sharp and inquisitive mind and always loves learning about new technologies.

Start a Free Trial

See how Ekata can reduce fraud risk for your business, contact us for a Demo.