Last year, the number of email users reached some 4.2 billionโ over half of the global population. In fact, on average, people have two email accounts; and they rarely switch email providers or change usernames. So what does this mean when it comes to your fraud risk assessment? Simple. Email-to-Name match can โ and should โ be factored into customer data verification.
The Ekata Identity Graph is a sourced, licensed, and authoritative digital identity data asset that can validate if the email address provided in the transaction is associated with the customer name or not. If someone doesnโt have access to the Ekata Identity Graph, it is still possible to measure the similarity of the name and the email address. This can be done using the Levenshtein distance.
This Levenshtein distance โ or edit distance โ is a string metric for measuring the difference between two sequences; a higher score means a larger โdistanceโ between the strings. It is called an edit distance because it measures how many edits would be needed to transform string one to string two.
The Levenshtein distance, which is one type of edit distance, is the smallest number of edits required to transform one string into another. In this situation โ using Ekata Identity Graph โ the Levenshtein distance is between the name and email address. An โeditโ is defined by either an insertion of a character, a deletion of a character, or a replacement of a character.
Letโs consider the name John Doe and the email addresses john.doe@example.com and test123@example.com. You can use the Levenshtein distance to measure how similar โjohn doeโ โ the normalized version of the original name โJohn Doeโ โ is to the local part of the email addresses (everything before the @ sign) โ to โjohn.doeโ and โtest123โ, respectively.
The difference between โjohn doeโ and โjohn.doeโ is small, with only one replacement needed for the transformation, so it results in a low Levenshtein distance value. The difference between โjohn doeโ and โtest123โ is large, with seven replacements and one deletion needed for the transformation, so it results in a high Levenshtein distance value.
What does the data tell us?
As the figure below shows, the Levenshtein distance* between the name and the email address makes good differentiation between the fraudulent and non-fraudulent transactions.
The x-axis shows the bucketed Levenshtein distance of the email-name pairs, whereas the y-axis shows the number of transactions falling in a particular bucket. The orange line represents the reject rate, forming a reasonable relationship; the higher the distance, the higher the risk.
*Instead of the standard version, a custom Levenshtein distance is used where different weights are assigned to the different โeditโ operations.
Regardless of the region and linguistic characteristics, high email-name Levenshtein distances are much more likely to be fraudulent, as the figures below show.
Final thoughts
Although a static email-to-name relationship is not enough to make a final decision regarding customer data verification, it is a vital consideration. Indeed, being on the lookout for suspicious username selection and automatically generated accounts is often one of the first steps in any successful fraud risk assessment that a merchant can take.
The Ekata Identity Engine consists of two data sources, the Ekata Identity Graph and Ekata Identity Network. It seamlessly captures the behaviors and dynamic relationships of the different digital identity elements. Our fraud risk assessment models are able to identify what fraud looks like for different businesses, enabling our customers to stop fraud before it takes hold.
To learn more about how Ekata can help identify your good customers and combat fraud, visit Ekata.com today.