Is this a valid address?: 399 Judson Street Lynden, WA 98264
As it turns out, answering this question is more complex than you might think. Read along as we highlight the challenges in validating addresses at a global scale, why Google’s Geocoding API might not be the best solution for you, and why a robust address solution matters.
The challenge with validating addresses
First, it’s imperative we define the meaning of a valid address. A valid address is one that exists in the official postal records for a given country or addressing area. This is distinct from deliverability, which describes whether mail service is available for a given address. For example, a vacant property can have a valid address but mail cannot be delivered to it.
Address validity is also distinct from its semantic correctness. The address used in our example above is a semantically correct US mailing address, but it is not a valid address (for reasons we’ll get into later in this post).
This is exactly what makes validating an address such a challenge. Not only does the schema or format of the address have to be validated, but it also needs to be checked against a truth set of real addresses. Now let’s consider this at a global scale:
- There is no international standard for address formats, so any parsing system must be built to handle them all.
- In rapidly developing areas, addresses themselves can change. For example, what was once “Main Street” could change to “East Main Street” as a municipality expands. The same is true for house numbers, city names, and postal codes.
- Sourcing a truth set is non-trivial. Can you trust a single data provider? How are they sourcing their data? What is the coverage of the data?
- Most address data comes directly from user input and is riddled with issues like typos and misspellings. How do you match an address input by a user to an address in your dataset?
Are you over-validating addresses?
When we talk to our customers and prospects about address validation, Google’s Geocoding API comes up as a common solution. There is no mystery as to why this is – search any developer forums and Google’s Geocoding API emerges as the top solution for address validation. So then what’s the problem? A geolocation solution is being suggested for a validation problem.
To highlight why this is a problem, let’s go back to our example. If you go to your map solution of choice and search for “399 Judson Street, Lynden, WA 98264,” it will drop a pin on a map, but there is an issue – remember our definition of validation? If you zoom in on the plotted location, it shows 399 Judson Street is an empty lot! Whereas our Ekata Address Validation API validates against postal service records, a Geocoding API does not, so you can’t infer that a plottable address is a real, deliverable one, or you’ll over-validate bad addresses.
A Geocoding API returns a validation level to indicate when an address is inferred versus rooftop level precision. Even when restricting the definition of “valid” to those addresses where the API returns rooftop level, precision can still be an issue, as we learned from one of Ekata’s customers who found they were significantly over-validating postal addresses with a Geocoding API. The net result was millions of bad addresses slipping through the cracks and incurring costs further down the pipe.
Why it matters
We see our customers leverage our address validation data in two primary use cases: fraud decisioning and eCommerce fulfillment. In fraud decisioning use cases, address validity is typically not the only piece of data used in making a decision, but that’s not to say it’s not valuable. Based on data from the Ekata Identity Network, we see that a given transaction is:
- 50% more likely to be good when a valid postal address is provided.
- 3x more likely to be good if, when a sub-unit number is provided, that number is valid.
- 8x more likely to be fraudulent if a full address is entered, but only the city, postal code, and country are valid..
Now for fulfillment, this is where the costs can really add up. Costs range from the tangible – such as address correction fees if the shipping provider catches the bad address, and restocking fees in the case that a delivery is returned – all the way to loss of goods in the case of perishables or loss of customers when they do not receive their order as the result of a typo.
The bottom line…
- Address validation is difficult to do correctly; there are no easy solutions.
- To properly validate an address, it must be cross-referenced against official postal records.
- Geocoding is not a substitute for validation and will lead to over-validation of addresses.
If you’re interested in a true address validation solution, check out our Address Validation API which provides coverage in every country and territory in the world. In fractions of a second, the API minimizes the noise that comes with messy address data, and returns a normalized address, an is-valid flag, geo-coordinates, and a permanent unique identifier. Our customers have found these features valuable in helping them fight fraud and reduce fulfillment costs associated with undeliverable addresses. In the next post in this series on address validation, we’ll dive into the challenges and benefits of unique identifiers.