Placekey Blog

Product updates, industry-leading insights, and more

Joining OpenAddresses with the National Address Database With Placekey

by Placekey

The Challenge of Working with OpenAddresses and the National Address Database

Address data can be notoriously difficult to work with due to inconsistencies and variations in address formatting. Two very large and popular address datasets are OpenAddresses Dataset and the National Address Database (NAD) maintained by the Department of Transportation. These datasets each contain tens of millions of addresses but combining them presents significant challenges due to their size, inconsistent address formats, and overlapping but not identical entries.

This is what Placekey helps solve. By appending each address with a Placekey, a universal identifier that enables you to eliminate the guesswork in resolving duplicates and enriches each dataset with additional metadata about each address.

Before Placekey, this would take hours if not days to join. Now, using Placekey’s Join Datasets tool in the developer dashboard, data scientists, analysts, and GIS professionals join these datasets in seconds with few clicks. The tool not only reduces the time and effort required for manual cleaning and matching, but also ensures that the resulting data is consistent, reliable, and enriched with key insights. See the video below for the tool in action.

Insights from the Join

In our recent project, we applied Placekeys to both datasets, processing 92,749,544 rows from OpenAddresses and 80,321,832 from the NAD. This exercise revealed several key insights:

24,757,037 Placekeys matched between the two datasets.

National Address Database:

  • 30.8% of entries in the National Address Database matched to entries in OpenAddresses
  • National Address Database added 24,701,614 ‘latitude’ values and 24,701,614 'longitude' values to Open Addresses 

OpenAddresses:

  • 26.7% of entries in OpenAddresses matched to entries in the National Address Database
  • OpenAddresses added 24,757,037 values for 'AddrPoint' and 24,757,037 values 'NatGrid' to the NAD

These statistics highlight the effectiveness of Placekey in merging address data, enriching it with additional context, and delivering insights quickly.

How to Perform the Join Yourself

For those interested in replicating this process, simply sign up and navigate to the Join Dataset section of the dashboard, which takes less than a minute to do. Both datasets already have Placekey and are free to join and download. Whether you're a GIS analyst or a data scientist, this tool offers a straightforward method to achieve accurate and efficient data joins.

If you're looking to join datasets, explore data partners, or upload your own data to generate Placekeys, everything you need is available in the dashboard.

Understanding the Datasets

The OpenAddresses Dataset is an open source initiative and a collection of addresses worldwide. It covers millions of address points worldwide, making it an invaluable resource for applications requiring global address data, such as logistics, mapping, and real estate analytics. While a great dataset, the addresses come from many aggregated sources sometimes leading to inconsistencies in address formatting, duplicate records, and incomplete metadata.

The National Address Database serves as an authoritative source for accurate address data across the United States. Maintained by the Department of Transportation, the NAD is designed to support critical public services such as emergency response, disaster management, mail delivery, and infrastructure planning. While highly reliable and standardized for U.S. addresses, its scope is limited geographically due to not every state in the US contributing and may still encounter challenges when combined with other datasets due to formatting differences. Trust us, building address matching from scratch presents numerous challenges.

Joining these datasets provides a stronger address foundation that can help in analysis and provide more valuable data for applications.

What is Placekey?

Placekey is an open solution for entity resolution for places and addresses. Placekey helps with deduping, matching, linking, syncing, and merging physical places. Its structure, detailed in the Placekey documentation, divides each Placekey into the "What" and "Where" components, ensuring precise geolocation and address matching. By using Placekey, users can easily deduplicate, sync, and merge address data, enhancing data quality and consistency across different datasets.

Use Cases and Applications

The joined dataset opens up numerous possibilities for analysis and application. Here are just a few of the potential use cases: 

  • Geospatial Analysis: Understand geographic patterns and trends by leveraging enriched address data.
  • Address Validation: Improve address accuracy in databases, reducing errors in mailing and logistics.
  • Machine Learning Models: Train models on enriched datasets to predict delivery times, optimize routes, or assess property values.
  • GIS Applications: Enhance mapping services and geographic information systems with comprehensive address data.

These use cases illustrate the potential of combining open address data with authoritative sources like the NAD, offering valuable insights and solutions across various industries.

Conclusion

The process of joining the Global OpenAddresses Dataset with the National Address Database has been significantly simplified with Placekey. By using the Join Datasets tool, users can quickly and accurately combine datasets, providing enriched and actionable insights. With the growing demand for accurate and comprehensive address data, Placekey provides a reliable and efficient solution for GIS and data science professionals. We encourage you to explore these datasets and discover how they can help.

Get ready to unlock new insights on physical places