Synthetic ID fraud Know your customer

How a probabilistic approach to Know Your Customer mitigates synthetic identity fraud



As synthetic identity fraud continues to surge across industries, in particular affecting financial institutions, it’s vital that we come together to make better, more accurate predictions of fraud risk, using a wider range of identity verification data. While Know Your Customer (KYC) processes are essential to ensure compliance, in particular protecting financial institutions against corruption, money laundering and terrorist financing, the modern digital terrain demands multiple layers of security to ensure security.  Specifically, we need to appreciate that while the deterministic risk assessment involved in KYC can be effective for identity verification purposes, it has significant drawbacks when it comes to sophisticated fraud attacks, such as synthetic identity fraud. Below, we are going to break down how a probabilistic approach to KYC will help mitigate this fraud, as well as increase efficiency across the board.

But first, let’s break down these a bit more.

What is deterministic identity data?

Deterministic data is also sometimes referred to as first-party data and is information that is known to be true. Importantly, because it is directly supplied to us as authenticated data (i.e. supplied by the person directly),  we can trust that these facts about an identity will never change. For example, a date of birth or email address or cookie ID is considered a deterministic, static identifier. A deterministic identity is achieved when an email address supplied by a user matches the same email address filed in a database.

What is probabilistic identity data?

On the other hand, probabilistic data is comprised of individual pieces of information gathered, such as an IP address, compiled to puzzle together a conclusion about an identity. In other words, probabilistic data is based on probabilities; behavioral data that can be analyzed to determine the probability of the user being male or female, young or old, black or white, etc.  Often, the information used to assign probabilistic identities is called “soft signals” or “non-unique device characteristics.”

probabilistic identity data

Detecting synthetic identity fraud with data

Unfortunately, in this age of data breaches, we simply cannot trust the first-party-provided digital identifiers. Indeed, we cannot believe any information can be “known to be true.” This is especially the case when it comes to synthetic identities, which use a combination of real consumer data and fictitious information to create a Frankenstein identity. Any organization or financial institution relying on their traditional fraud detection models and KYC processes to evade these fraudsters with their trusting, deterministic approach to identity verification will come up short.

Here’s the deal; working in the identity verification space, we are often asked by financial institutions for information regarding our “email-to-name” coverage. While we take any opportunity to boast about how over two decades of at the forefront of the identity space has created one of the most intelligent identity graphs of linkages between individuals and their phones, emails and addresses – it’s often beside the point.

The fact is, while establishing an “email-to-name” match can be a useful signal for traditional Know Your Customer (KYC) compliance purposes, it rarely cracks the top 10 most valuable signals when applied in a machine learning risk model for identity verification. However, the email signal that we do see consistently moving risk models is “email-first-seen-days”, which tracks when we first saw an email enter our network. Still, as powerful as this signal is, it is never the signal we’re asked about by our clients.

Limitations to the deterministic approach in digital identity verification

The focus on name-matching coverage is a vestige of the traditional, deterministic approach to risk that has been at the core of identity verification for decades, especially amongst old-school financial institutions. KYC implores risk teams to affirm these links: does the applicant live at this address?  Is this their phone number?  Is this their email?  The answers to these questions are binary: is there an “email-to-name” match?  Yes, or no?

No doubt, these are valuable insights. However, in this ever-evolving global digital economy, the concept of digital identity has grown increasingly complex and fraudsters have evolved to match that level of sophistication.

An alternative approach to complement risk models

Unfortunately, for fraudsters, replicating an “email-to-name” match that passes muster is much easier than you would think. In fact, what takes increased sophistication is the patience to sit on that email and let it age before using it in a malicious scheme. And this is where “email-first-seen-days” really, really shines. Therefore, the question we should all be asking is “Can I trust this email?” And the answer to that question lies in a probabilistic approach.

Ultimately, not every new email address is fraudulent. However, more often than not, new email addresses correlate to a higher risk of fraud. What we do is track the age of an email based on when it was first seen in our network. For example, when an email address has been seen in our Identity Network for less than 30 days, fraudulent activity is observed 100x more often than with those email addresses seen outside the 30-day window.

For instance, when fraudsters generate a synthetic identity, they will pair a new disposable or temporary email address with hacked phone numbers and addresses. Legitimate customers, however, tend to use long-documented emails and are therefore known to us, often for several years or more.

Risk signals like “email-first-seen-days” correlate to those behaviors and can identify where there is a higher likelihood of fraud to occur. That is the probabilistic approach. The answer isn’t a binary yes or no, it’s, in fact, a number; the number of days since our team first saw the email and a correlation against known fraud that demonstrates its probabilistic value.

Why use a probablistic approach digital identity verification

Benefits of a probabilistic approach to digital identity verification

To take it even further, we can look at how long the email has been associated with the phone number provided. That again is not a question posed by KYC. However, when leveraged in a model trained for probabilistic fraud risk, it is one of the most important features enabling banks and other financial service institutions to root out synthetic identities before it’s too late.

Financial institutions are beholden to a deterministic risk assessment, but that shouldn’t limit the questions they ask to verify a digital identity. Seeking probabilistic answers compliments that deterministic knowledge with a risk assessment that allows financial institutions to refocus on the customer first in pursuit of compliance diligence.

We are helping financial institutions bridge the gap between deterministic and probabilistic risk, improving KYC by first ensuring that you actually know who your customer is. We do this by scoring the risk of your customer before you put them through the KYC verification workflow. This not only helps mitigate synthetic identity fraud attacks across industries, but it also enables the optimization of customer experience, as well as reduces the costs associated with manual review.

To find out more about how our risk signals can increase your revenue, Contact Us today.

Related content