In 2022, the number of email users will reach 4.2 billion– over half of the global population. In fact, on average, people have two email accounts. They rarely switch email providers or change usernames. So, what does this mean when it comes to your fraud prevention strategy? Simple. Email-to-Name match can – and should – factor into identity verification.
The Ekata Identity Graph is a sourced, licensed and authoritative data asset. It can validate if the email address provided in the transaction is associated with the customer name or not. If someone doesn’t have access to the Ekata Identity Graph, it is still possible to measure the similarity of the name and the email address using the Levenshtein distance.
This Levenshtein distance – or edit distance – is a string metric for measuring the difference between two sequences. A higher score means a larger “distance” between the strings. The edit distance measures how many edits to make to transform string one to string two.
The Levenshtein distance, which is one type of edit distance, is the smallest number of edits required to change one string into another. In this situation – using Ekata Identity Graph – the Levenshtein distance measures between the name and email address. An “edit” is either an insertion of a character, a deletion of a character, or a replacement of a character.
Let’s consider the name John Doe and the email addresses email@example.com and firstname.lastname@example.org.
You can use the Levenshtein distance to measure how similar ‘john doe’ – the normalized version of the original name ‘John Doe’ – is to the local part of the email addresses (everything before the @ sign) – to ‘john.doe’ and ‘test123’, respectively.
The difference between ‘john doe’ and ‘john.doe’ is small, with only one replacement needed for the transformation, so it results in a low Levenshtein distance value. The difference between ‘john doe’ and ‘test123’ is large, with seven replacements and one deletion needed for the transformation, so it results in a high Levenshtein distance value.
What does the data tell us about fraud prevention?
As the figure below shows, the Levenshtein distance* between the name and the email address. It shows good differentiation between the fraudulent and non-fraudulent transactions.
The x-axis shows the bucketed Levenshtein distance of the email-name pairs, whereas the y-axis shows the number of transactions. These fall in a particular bucket. The orange line represents the rejection rate, forming a reasonable relationship. The higher the distance, the higher the risk.
*Instead of the standard version, a custom Levenshtein distance is used where different weights are assigned to the different “edit” operations.
As the figures below show, regardless of the region and language differences, high email-name Levenshtein distances are much more likely to be fraudulent.
Although a static email-to-name relationship is not enough to make a final decision for identity verification, it is a vital consideration. Looking out for suspicious username selection and automatically generated accounts is a first step in the fraud prevention process a merchant can take.
The Ekata Identity Engine consists of two data sources, the Ekata Identity Graph and Ekata Identity Network. It seamlessly captures the behaviors and dynamic relationships of the different identity elements. Our fraud risk models are able to identify what fraud looks like for different businesses. Therefore, enabling our customers to stop fraud before it takes hold.
To learn more about how Ekata can help identify your good customers and combat fraud, visit Ekata today.