Product updates, industry-leading insights, and more
Joining NYC Evictions and Expense Violations Datasets
by Placekey
Joining NYC Evictions and Real Property Income and Expense Violations Datasets
New York City collects a lot of data. Having a lot of data is rarely enough to make meaningful discoveries. The most interesting stories emerge when we’re able to take multiple datasets, join them, and identify patterns.
Here we look at two sets of address data from NYC Open Data: the Real Property Income and Expense Form Noncompliance List and the Evictions data. The code for how I merged these datasets with Placekey is in this Google Colab Notebook
Merging these two datasets allows us to investigate the rate at which landlords who fail to file their required RPIE documentation evict tenants, and to study how other factors influence this rate.
Insights From Joining Evictions with RPIE Violations
We started by appending Placekeys to around 15,000 rows of RPIE data and 93,000 rows of Evictions data. Next, we joined the datasets on the placekeys we generated. This took only seven minutes.
From this join, we quickly learned a few things:
~25% of addresses in the RPIE dataset are present in the evictions data
4% of addresses on the evictions data appeared in the RPIE dataset
23% of the records in the merged dataset indicate that the landlord evicted a tenant the same year as they failed to file their RPIE form
Over 66% of records indicate that the landlord evicted a tenant within two years of receiving a citation
After running some basic analysis we also learned that:
The intersections of these datasets were highly geographically concentrated. Queens alone accounted for nearly as many data points as the other 4 boroughs combined. The heatmap below illustrates this well.
Heatmap of evictions / RPIE violations intersections. Can see large concentration in Queens
When we weight by the relative population of each borough, this disparity becomes even more pronounced, with Queens and the Bronx representing an outsize proportion of the weighted total
The number of evictions by nonfilers cratered to at or near zero during the eviction-freeze brought on by the Covid-19 pandemic, as this time series demonstrates:
Without Placekey, the majority of time would be spent joining data, but with Placekey, it took less than ten minutes and maybe a dozen lines of code.
About the Datasets
Both of these datasets come from NYC Open Data, which is the portal to access almost all public data released by the New York City government. They offer a ton of really great, really big datasets. Additionally, since their datasets all concern the same geographical area, but originate from a wide range of different agencies and departments throughout the city, they provide plenty of opportunities to practice joining datasets using placekeys.
The RPIE data is a rolling list of income-generating properties for which the owner failed to file a Real Property Income and Expense Form, a special tax form for assessing the value of income-generating property, for a given year.
The evictions dataset is a rolling list of all eviction proceedings initiated within the city. It includes both procedural information as well as information regarding the outcome of the case.
Download the Data
Both of these datasets are public and are updated regularly. However, we’ve made the files we used for this post available for download via the links below in case you’re interested in recreating the join:
If you just want to check out the joined dataset, you can download it here.
What is Placekey
Placekey is an open entity matching API for places and addresses and helps with deduping, matching, syncing, and merging physical places. You send in an address and get back a simple, unique key. You don’t have to worry about cleaning up or standardizing addresses either, as our matching algorithm is able to resolve even the most mangled addresses to the correct location. So it is a really nice system to help join, dedupe, and merge data about physical places.
Placekey is free for up to 10k lookups a day. Beyond that there is a small fee to cover servers and core engineering work. The key itself is open and you get a perpetual license to store and use Placekeys.
There are a few different types of Placekeys, which all help join places data but provide flexibility based on your specific use case and the level of granularity you care about. For the purpose of example we used address_placekey and placekey but will also give you building_placekey in the files linked above. A Placekey is returned if you provide a location_name in your request. Address_placekey is when you only need to dig into the address or do not have any POI information. For example, you can use address_placekey to quickly identify all the POIs at the same address.
Conclusion
We hope these files are useful and we encourage you to get an API Key and try the API yourself. This notebook is a templated version of the one we used to join this data, and is a great entry point to trying out Placekey. Feel free to explore this data in detail and let us know what you think, as well as if you find any issues with our matching.
This join was done quickly but aims to show how Placekey scales exponentially. Whether you have 20 POIs or 20 million POIs, Placekey enables you to effectively join these datasets.
Placekey
Placekey is the universal standard identifier for a physical place. Learn more about us at Placekey.io.
Get ready to unlock new insights on physical places