Here’s the crux; organizations want absolute proof of identity but must learn to live with shades of uncertainty. To mitigate fraud and ensure security, business managers need a multi-layered approach to identity verification that relies more on dynamic data and less on static, regulated PII information. This data richness is the answer to the ever-evolving sophistication of cyberattacks.
Ekata does data richness differently. Our approach to sourcing, aggregating, and synthesizing identity verification data is what makes the Ekata Identity Graph unique.
So, what makes our data so impressive?
From the very beginning, we have been committed to improving and adding value to our data constantly. This means we need to have an answer to what “good data” means, measure how good it is and constantly try to make it even better. All in that order!
We measure our data on three criteria: data richness, data quality and completeness and data accuracy.
Firstly, you too should be measuring data quality in some way. If you’re buying data, you should always ask the question “What makes this data so impressive?” and expect a good answer. Be very nervous if you don’t get one! And, if you’re selling data, you should anticipate this question; even if a customer doesn’t ask, at least you’ll know you’re making your data better.
At Ekata, we organize our identity verification data in a graph-structured database and manage over 5 billion global records in real time. This allows us to create one million new links a day within our Identity Graph, to constantly improve the quality of our data.
Let’s explore and define each criterion. First cab off the ranks, data richness.
Thankfully, data richness is straightforward to assess; what kind of data you have, what attributes, etc. Since we use a graph structure database we can ask about the data richness of various entities, links and attributes. Data richness is important both because it inherently adds value, but also because it creates more opportunities to draw insights and value from the data itself. Data elements we have include:
- People, businesses, addresses, phone numbers, emails, social profiles, URLs
- Person-to-address, phone-to-person, person-to-historical, person-to-email, person-to-phone
- Phone data – 7-line types, 3,000+ carriers, pre-paid, SMS capable, etc
- Person data – name, sex, age range, related persons, etc
- Address data – 30+ years of address history, receiving mail, geolocation, etc
- Email data – registered name, first seen, auto-generated, etc
- Relational data – start and stop date, type of relationship, etc
Ensuring data richness is useful for companies performing identity verification. For example, if we need to answer whether two people with similar names are indeed the same person, having good information regarding date of birth and sex is important. When processing a potentially fraudulent transaction or assessing a lending application, additional attributes about the users are crucial to making an informed decision. Data is a powerful ally when it’s good data.
Next, how does data completeness affect identity verification accuracy?
At the end of the day, it always comes down to metrics. To measure identity verification accuracy and data quality is to understand the success of using dynamic data; what is the return on investments and, ultimately, how satisfied are your customers? Leveraging data that falls short of complete data attributes, or coverage, will diminish your returns and negatively impact your customer experience.
Thankfully, we measure data accuracy on “completeness.”
Completeness can mean several things. However, when it comes to identity verification accuracy, the obvious measurement is coverage or “attribute fill rate”.
This is basically the frequency for which we can provide that data attribute for a given entity. For example, how often we can provide an age range for a person. Sure, it sounds simple, but completeness is tricky. Measurements of the completeness of entities and their relationships requires comparing our data set to the actual, ever-changing, real world. This can prove to be difficult because most of the time the only way to accomplish this is to gather statistics from external sources.
For example, what percentage of coverage do we have of the US adult population? Or what percentage of US mobile phones do we have linked to a person? To know the answers to these questions and ensure identity verification accuracy is to utilize external data sources to better understand the true population of the US or the total number of US mobile phones. Furthermore, identity verification accuracy – and therefore, completeness – requires managing enormous amounts of data.
Here are some numbers from the Ekata Identity Graph that we track and measure the completeness and the data quality of our identity elements and relationships.
- We have over 5B phone numbers worldwide
- Our coverage is 96% for all US businesses
- We have 99% of the US addresses
- We are linking over 600 million person-to-phone relationships
- There are over 2.7 billion unique person-to-address relationships
We constantly strive to ensure data quality, improving the completeness of data entities and relationships but not at the expense of identity verification accuracy. A big challenge is the comparison of the number of person entities we have to a given population. We know duplicates can get created in our graph, by just adding more people, it will by this definition, boost our completeness KPI. However, this will have a negative impact on data quality, which is not helpful for ensuring the identity verification accuracy of applicants, orders and customers. We see this a lot from other data providers where there is an inflated claim about the number of records completed; because, due to duplication, the data quality is poor.
Unfortunately, ensuring identity verification accuracy is no longer possible using only PII data. At Ekata, we ensure data quality by using cutting-edge data science to synthesize and corroborate our data from a variety of proprietary and non-proprietary sources to provide customers with the best dynamic identity data. In order to mitigate risk, organizations need to re-evaluate how they verify the identity of customers. To do this they need to be assured identity verification accuracy via data quality and completeness.
Data accuracy can be the most difficult to accomplish, yet, often the metric most coveted. To truly know your customers, a thorough identity verification assessment is needed at the time of the transaction, application or inquiry to layer in and leverage dynamic data. To have success with real-time identity verification – and ensure identity verification accuracy – you need to ensure not only data richness and data quality but also data accuracy.
Data accuracy is what everybody is interested in. It’s also the most difficult to measure because it always involves comparing our data to the real world. And, of course, the real world changes rapidly, so this comparison can range anywhere from, “expensive to do well” to “near impossible” to do at all. Ultimately, the comparison to the real world is the only legitimate definition of data accuracy, despite what you might hear and what other vendors may claim. Often, we hear anecdotes about providers that claim 97% data accuracy. What this often means is that if you put a record into their database, you would get the same record back… about 97% of the time. So, this is not useful if you are trying to determine if the data itself is true to the person’s current phone number, address, associated people, etc.
We do data differently in the Ekata Identity Graph by sourcing, synthesizing and using our data science to provide the most up-to-date and accurate non-PII data. We are a data synthesis company that makes the data better for layered identity assessment, as it allows for corroboration. The corroboration of data is what provides confidence in greater data accuracy. We are always testing the accuracy of our data with algorithms, but more importantly, we also test the data accuracy of our algorithms with the human element. We have real people make calls to the data set to determine if the identity we have associated with that number, name, address, etc., is indeed who we think they are. This method allows us to constantly learn and refine our algorithms so we can serve up accurate, rich, complete data.
For example, often with a single source data point, you could see up to 65% data accuracy of that specific data attribute. However, if you are using a data vendor who sources, synthesizes and corroborates across a multitude of sources, you can get data accuracy rates up into the 90th percentile.
As a data company, we must create meaningful and honest measures that represent our data accurately and ask for the same from others. If you truly want rich, complete and accurate data to ensure identity verification accuracy, then you either need to create your own database or use vendors that:
- Are methodical in their approach to sourcing, synthesizing and measuring the data
- Understand what the data is supposed to be representing in the real world
- Understand how well the data represents what it’s supposed to, so it’s useful for identity verification
As lenders, insurance carriers, retailers and travel providers move to meet the demands of their mobile customers, they need to evaluate old identity assessment requirements and workflows to support these transactions and applications. Given the increase in breaches and compromises of personal data, the approach to just increase security doesn’t always work. Analysts are recommending a layered approach to identity assessment with a focus on rich, complete and accurate dynamic identity verification data.
Visit our Contact Us page to learn more about Ekata’s identity verification solutions.