Why you can’t rely on CPF checks for digital identity verification in Brazil

Why you can’t rely on CPF checks for digital identity verification in Brazil



As the global economy’s transformation accelerates those that do not keep pace with this digitization will leave themselves vulnerable to fraudsters who are getting more sophisticated and taking advantage of those who are falling behind. Ironically, despite being recognized as the second most advanced country in the world by the World Bank in terms of digitalization of public service structures, Brazil is vulnerable. What’s the issue, you ask? Identity verification.

The very concept of an identity document is a complex issue in Brazil (and Latin America as a whole). This is because Brazilian citizens are issued multiple documents over their lifetime, with each state they reside in requiring different models of documentation. Due to its status as the most common form of documentation, essential for Brazilians for a variety of activities, including opening a bank account, filing taxes, purchasing property and even making purchases online, we are going to be focusing on the CPF (Cadastro de Pessoas Físicas) number in this blog. Specifically, the limitations of CPF checks in the identity verification process – and the prevalence of CPF fraud leaving consumers and businesses across Brazil vulnerable, especially when they lack the data necessary to prevent it.

Before we examine the importance of dynamic data in digital identity verification processes, let’s first break down and define the issue at hand: CPF fraud and the limitations of CPF checks.

What is a CPF number?

Cadastro de Pessoas Físicas (Portuguese for “Natural Persons Register”), or the CPF number, is a unique identification number that is assigned to each resident of Brazil. Issued by the Federal Revenue Bureau, a CPF number consists of 11 digits.

In some ways, the CPF number is similar to the US Social Security Number (SSN) as it is used to trace an individual’s financial activities, such as banking, credit history, taxes, pension contributions and so forth. However, unlike the SSN in the US or a Social Insurance Number (SIN) in Canada, the CPF is not confidential. Indeed, as mentioned above, the CPF number is often requested for multiple activities in Brazil – from buying groceries, renting a car and, in some cases, logging in to public Wi-Fi.

Identity theft and CPF fraud in Brazil

Identity fraud across Latin America has been an issue for some time. From January to November 2022, Brazilians suffered more than 3.6 million identity fraud attempts, which represents one every eight seconds.

While identity theft is a global problem, there is one element to identity theft that is unique to Brazil: the CPF number.

As the most common form of identification in the country, it is no wonder that CPF numbers feature so prominently in fraudulent schemes in Brazil. In fact, security intelligence reports from years back recognize the growing problem, naming CPF fraud as the favored fraud over traditional credit card fraud due to CPF numbers not being readily available for their holders to track. Indeed, while credit card transactions and statements can be readily accessed online, CPF activity is near impossible to monitor. A fraudster can easily steal an individual’s CPF number (as mentioned earlier – they’re not exactly confidential), obtain a loan or make a purchase on behalf of a legitimate CPF holder, incurring debt in the victim’s name.

More insidious still, CPF numbers are an ideal target for scammers. There is an entire underground ecosystem dedicated to facilitating CPF fraud as a service. All that is required is a mix of valid CPF numbers, easily obtained via straightforward phishing and malware attacks. In fact, in 2020 an official audit showed there were at least 12 million more CPFs assigned to “citizens” than the total population of Brazil.

CPF identity check

The problem with CPF checks for identity verification

Organizations operating within the Brazilian market are legally required to verify customers’ CPF numbers to comply with the country’s anti-money laundering and counter-terrorism financing (AML/CTF) laws. In turn, businesses inside and outside the country turn to KYC processes to ensure compliance and reduce reputational risk – and don’t go any further. Unfortunately, when it comes to CPF numbers, this is a problem for several reasons. Firstly, as already emphasized, these numbers are easy to steal. Secondly, despite the commonality of CPF numbers, among the estimated 210 million Brazilians, some 50 million do not have an active CPF number. This could be because it was never issued in the first place, or because it was blocked. Whatever the reason, these citizens are marginalized.

However, the key reason relying on KYC processes to validate identities in Brazil is such a problem is that CPF numbers are static identifiers, i.e., a fixed data attribute that doesn’t change (like date of birth). Any identity verification process that relies on readily available deterministic data is much more likely to be compromised is a problem. Indeed, in today’s digital economy, where information is so easily shared and accessed through both normal and nefarious means, static data is not enough.

The dynamic data difference

Because sophisticated identity thieves are known to create fake identities from stolen static data elements, organizations need to embrace dynamic data. In short, dynamic identity data elements include things like a customer’s phone number, email address and their IP (Internet Protocol) address. Unlike static data attributes, dynamic data attributes can leverage multiple dynamic linkages, metadata, history and activity patterns to validate an identity.

For example, as detailed in our eBook on detecting and combatting identity fraud, these dynamic links between identity elements matter a lot more for stolen identities because there is low consistency between elements. For example, with metadata, looking at address history versus the length of credit history is a useful indicator because the duration should be similar. Furthermore, behavior elements have lots of strong indicators. For example, IP risk is particularly effective to detect the origination and location of an identity.

Based on what the customer has entered in a record — whether it be an account opening application or a transaction — the dynamic identity data elements of name, email, phone number, address and IP address can be evaluated.

The benefits of a probabilistic approach to risk assessment are immense. For example, risk signals like “email-first-seen-days” correlate to an identity’s behavior and can better identify where there is a higher likelihood of fraud to occur. The answer isn’t a binary yes or no answer – it’s a number (specifically, a number of days) and the correlation against known fraud demonstrates a probabilistic value. To take this even further, this approach could even decipher how long an email has been associated with a phone number. This simply isn’t a question posed by a traditional KYC approach.

Leveraging external data insights

So, now that we know that your internal data – especially static identifiers – are not enough to mitigate CPF fraud, what should we be on the lookout for? Well, to add context and drive differentiation, be sure you leverage the right external data insights. These will enable identity verification and fraud prevention thanks to the behavioral links provided by machine learning technology.  

International Data in Identity Verification

Key features that make Ekata data unique

  • Comprehensive global coverage  

The data in our products provides a comprehensive picture of a customer’s identity by analyzing five key identity elements – name, phone, email, physical address and IP – and how these elements are linked to an identity. Then our machine learning-based scores leverages international data to provide coverage and real-world digital interactions to predict good and fraudulent customer activity.

  • Cross-border and cross-industry 

Most of our customers have local or customer-specific networks which limits their ability to provide the best value for the customer. Fortunately, our global Identity Network provides insight into cross-border and cross-industry fraud patterns, beyond local data.

  • Data cleansing and pre-processing 

Data cleaning is the process of detecting or removing noisy and corrupted or inconsistent data from a network. Ekata has a rigorous data cleansing process that leverages analytics and big-data engineering to validate and normalize data for efficient use in fraud-prevention models. In the case of CPF fraud, because CPF cards only use static data – name, gender, date of birth, nationality – and not dynamic data, like address, email and phone, verification cannot occur. These identities are not validated!

  • Real-time assessment  

We organize our identity verification data in a graph-structured database and manage billions of global records in real time. We calculate and assess risk, as well as standardize, normalize and de-duplicate data. This ensures we don’t end up counting duplicates or manual agent refreshes as a new transaction.

In conclusion

Given that 80% of identity fraud attempts in Brazil occur at the identity verification stage of a transaction and digital account opening, the ability to confidently assess fraud risk comes down to having access to unique and valuable data that enables accurate and fast decision-making to accurately verify an identity. To leverage dynamic data and predictive risk signals is to go beyond the limitations of the deterministic approach that is CPF verification.

To learn more about Ekata’s commitment to data excellence, and identity verification capabilities based on the five key dynamic data elements of name, email, phone, IP and address, get in touch today.

Marko Nikolic Avatar

About the Author

Related content