Infra
How an ‘Army of Robots’ Is Transforming Open Data in NYC
New York City is a global leader in open data, with a massive portal containing more than 2,500 data sets and 6 billion rows of information, ranging from city employee test scores to restaurant inspection results. But building and maintaining the collection has been a journey of evolution and ongoing improvement, one that’s increasingly becoming automated.
In New York City, open data is part of legislation passed by former Mayor Mike Bloomberg in 2012, mandating that all public data be published online on a single portal by 2018. To improve the portal, the Open Data Law was amended in 2015 and 2016 to require regular updates to data sets and careful review of Freedom of Information Law (FOIL) requests for potential new data sets.
The ambitious legislation has led to a robust portal, but it’s not perfect. According to the city’s dashboard, only about 60 percent of data sets are updated on time. Additionally, only about 40 percent of planned data set releases have appeared on the portal on time in the last year.
While open data in the city continues to evolve, the moves it’s making to ensure data accuracy and accessibility offer lessons for other governments. Many municipalities struggle to comply with public information laws due to overwhelming requests, often leading to costly legal battles. For example, reporting from the San Jose Spotlight revealed that San Jose, Calif., has been sued at least six times for failing to provide public information, costing the city more than $500,000 in taxpayer dollars after losing several cases.
BetaNYC, a nonprofit civic organization that was involved with lobbying for New York City’s open data law, asserts that open data portals shouldn’t be considered optional.
“If your city doesn’t have an open data portal, they’re wasting your tax dollars and time,” said Noel Hidalgo, executive director of BetaNYC. Hidalgo added that the costs the government incurs by putting every data request through a lawyer or communications representative can often be avoided by making appropriate data available to the public at all times. “There are more efficient ways to share information in the 21st century.”
HOW AUTOMATION CAN IMPROVE TRANSPARENCY
To streamline the process of regularly updating data, the city has turned to automation.
At present, about 17 percent, or 460, of the data sets on the New York City Open Data Portal are automated. Chief Analytics Officer Martha Norrick said the feeds improve the experience for the public as well as employees.
“We’re always working to make sure that we’re automating feeds that go into open data, so things automatically refresh every day without having to have a person transmit a data set and have it manually be uploaded to the portal,” she said. “We think that improves the experience for the public, and we think that improves employee experience as well to have help, to have an army of robots just do the work for us.”
BetaNYC advocates for as many automated feeds as possible. Hidalgo believes the practice can reduce some of the inconsistencies his organization has spotted that occur on the portal due to staff changes and create a more reliable system.
“We argue in front of the City Council that the city’s open data team should have many more resources to ensure that there’s batch processes of data publishing that would help minimize that type of inconsistency,” he said.
HOW A CHECKBOX IS TRANSFORMING FOIL REQUESTS
In 2022, a checkbox was added to Department of Records and Information Services FOIL request forms to easily identify requests for data that should be on the portal.
Before the checkbox, employees had to manually sift through each FOIL request to determine if it required data to be added to the portal. This was a time-consuming and inefficient process, prone to potential oversights. The checkbox provides a clear and immediate signal, ensuring that no data requests slip through the cracks.
“Adding a checkbox to a FOIL request form is truly the most arcane form of government, but it really has improved the quality of this information and it’s improved the quality of open data,” Norrick said. “That simple change has closed the loop on the compliance process in a way that I think has been hugely successful for public agencies.”
Norrick added that often, publishing a FOILed data set isn’t an immediate process. Sometimes, data released through FOIL needs to be reorganized or anonymized before it can be published on the open data portal. This might involve removing personally identifiable information or aggregating data to protect privacy.
As New York City continues to identify the ways it can best comply with the Open Data Law, Hidalgo added that there’s plenty of powerful lessons other cities and states can learn from their work, especially as artificial intelligence projects emerge.
“You really need to invest in a practice where you’re cleaning your data and that you have as many people helping inform how to make a healthy ecosystem,” he said.