How “Good” Does Data Need to be in Order to be Used in Civic Tech?

August 3, 2021–By Anna Gorman

This blog is the second in a three-part series highlighting lessons learned from the ongoing TOPcities pilot project. The series aims to provide open insight into real-time lessons from interventions in the field. Read the first blog about how to successfully co-create across city, community, and tech partners.

Do we even have the data we need?” It’s a question that plagues many individuals working in city government, particularly when evaluating how to meet the pressing needs their community members have identified. Team members in San José, CA, and Saint Paul, MN, working on the TOPcities project—which brings together city government officials, community partners, and technologists to leverage local open data to build products that meet residents’ housing-related  needs exacerbated by the COVID-19 pandemic—found this question at the crux of their 18-week sprint. 

The TOPcities model rests on using local open data—machine-readable, public data made freely available online—and taking advantage of its potential to drive change when tailored to residents’ needs. But the sprint teams quickly learned that there was no magic process to obtain the data they needed. 

The teams first went through community research processes to understand the housing challenges facing Saint Paul and San José. What came out of those processes was a clear articulation of the community’s needs around access to shelters and rental assistance. Our starting point was to work backwards to figure out what data the cities already had that related to those needs.

Teams in both cities learned that neither had this data readily available in centralized databases, in machine-readable formats, or ready to import into a tech tool. To overcome these challenges, sprint teams investigated as many datasets as possible, repeatedly asking themselves what, and how “good,” the existing data needed to be in order to be helpful in the tools they developed.         

How representative does data need to be?

The San José team began the TOPcities project with the goal of being able to “better help thousands of our most vulnerable families confronting an ‘eviction cliff’ with data-driven solutions that enable us to focus and maximize our resources to put our community on the path to an equitable recovery,” Mayor Sam Liccardo said at the project launch. The team originally planned to measure the “eviction cliff,” or the expected number of residents facing eviction when COVID-relief eviction moratoriums end. To do so, the team would have relied on a dataset built from information the city collected from landlords when they filed eviction notices—data that directly represented the number of tenants facing eviction. 

In the first phase of the sprint, team members learned from interviews with residents at risk of eviction and community advocates that covering the rent was just one part of the mounting cost burdens facing households in San José; the faster they could get rental assistance in the short-term, the less likely it would be that additional costs would destabilize them later on. The team began to build a prototype rental assistance tool to improve the information available to residents and paired it with on-the-ground resources by partnering with Catholic Charities of Santa Clara County to set up local eviction pop-up centers where residents could get in-person support with their rental assistance application. The tool relies on data from the city on the location and hours of operation of these local pop-up centers to create a tool that meets residents where they are. By shifting away from attempting to count the number of people facing eviction and shifting towards a rental assistance platform, San José could start to improve the quality of rental assistance delivery and build new data linkages that measure how people engage with services. The team decided to share simple, accessible, and available information about rental assistance, like the location of eviction pop-up centers, while building better infrastructure for data-driven understanding in the future. 

How complete does data need to be?

Saint Paul also focused on providing housing-related assistance to its residents, specifically aiming to “leverage public data in our ongoing work to connect people experiencing homelessness to support, services, and shelter,” Mayor Melvin Carter said at the start of the sprint. To do so, the sprint team built out a prototype information hub to document shelter resources, with plans to build out features that allow users to see what shelters look like and directly request services. 

The challenge was that they were mainly relying on data about shelters that technically belonged to the County and to Continuum of Care providers. Throughout the problem understanding phase, the sprint team realized that the data that frontline workers needed about real-time shelter availability was being passed around by word-of-mouth or through informal spreadsheets. Simultaneously collecting data about the pipeline of services and making it available to the people that need it most would not only benefit the city itself, but also the residents. 

Saint Paul hopes that this platform will not only make services more accessible to unsheltered residents, but also will collect and create robust open data on shelter availability that does not yet exist. As such, as the city continues to build this platform, the data it rests on is technically incomplete. The tool will primarily collect data from shelters inside the city limits of Saint Paul, a subset of the providers in the Continuum of Care, but will allow providers working at the county level to improve the data as well. As available resources change, new ones will arise, requiring the city to collaborate with Continuum of Care providers and the county to continue building out open data to remain an effective resource for the community. 

How do we maintain our data?

At the beginning of the TOPcities sprint, both cities struggled with finding available, accessible, and valid or complete data, a challenge they needed to overcome to begin building effective  products. Even in cities with open data programs, the responsibility to govern that data is dispersed across government departments and their nonprofit partners, making it difficult to maintain or publish over time. 

A major goal of the TOPcities program is to help cities build the capacity to leverage their own data by starting with the fundamentals of good data governance, and one way to do so is by creating a data inventory: a comprehensive record of data maintained by a city, including data about data itself, or metadata. Data inventories can allow cities to create a complete picture of the data they have and the data they lack, information that would have helped San José and Saint Paul determine their capabilities early in the outset of the sprint. They also provide an important advantage for collaboration between departments and the public, making data easier to access and release. Data inventories can set cities up for success in future data-driven projects, providing a foundation for innovative solutions that can truly make a difference.         

While data can provide the information critical to building a product that truly meets residents’ needs, the TOPcities sprint teams learned that there is not a one-size-fits-all process for data use and management, nor is data a one-stop solution. Not all aspects of lived experience can be quantified into rows and columns, and centering residents and their lived experiences requires recognizing that data is only part of the solution. Well-managed open data provides an incredible foundation for much-needed innovation; creative approaches to using this data only increase its potential to drive change for the better.

Anna Gorman was a student analyst at the Beeck Center in the 2020- 2021 academic year and is studying Science, Technology and International Affairs, Computer Science, and Chinese at Georgetown University. Connect with her on LinkedIn.