September 15, 2020 – By Amen Ra Mashariki

The myriad issues we are dealing with around the spread and impact of COVID-19 in city centers reminds me that we must be forever vigilant when it comes to ensuring government, local NGO, relevant private sector and national and global health data is updated, available and accessible. In 2015, 128 New Yorkers were infected and 12 people died as Legionnaires bacteria spread through untreated water in a building’s cooling tower. During that time, I was New York City’s Chief Analytics Officer, leading the Mayor’s Office of Data Analytics. My job was to translate challenges like this into questions that could be answered using analytics, and get agencies to use their data in a new way.

What I learned then is that Legionnaires can be a fatal form of contagious pneumonia that preys hard on the elderly and people with compromised health. Legionella bacteria are found in different freshwater environments, such as water tanks, hot and cold water systems, and cooling towers, but it grows especially well in warm water. People become infected by inhaling contaminated droplets and mist released from water systems.

One of the main sources is contaminated central air conditioning cooling towers. There are over one million buildings in New York City, many of which are decades old, and the city has limited resources for inspection. Where do you begin?

Machine Learning Can Help a Human Crisis

During the 2015 outbreak, we began with data to identify locations with the biggest risk for potential outbreaks. At the time, New York City did not have an existing list of cooling tower locations. The team worked around the clock for weeks pulling in fragments of information from multiple agencies. We built a data management process from scratch to gather, integrate, and ensure the quality of cooling tower inspection data on a daily basis. From this, we built a machine learning algorithm.

Machine learning refers to a set of data-driven algorithms and techniques that automate the prediction, classification and clustering of data. Machine learning can play a critical role in spatial problem solving like this one — where do we even begin to look for deadly bacteria in a city of 8.5 million people?

We used machine learning to identify buildings likely to have contaminated cooling towers by understanding cooling tower locations based on building types and land attributes. The team was able to raise the hit rate for identifying cooling tower locations from 10% to 80% with data. That means, every 8 in 10 attempts to identify a building with a cooling tower was successful. The bottom line? Building inspectors were able to identify contaminated cooling towers faster and save lives.

From this machine learning project, the Building Intelligence tool was born. The tool is a 360-degree reconciled database for buildings that provides information more quickly and easily to agencies across the City.

computer generated map showing New York City buildings
The New York City Building Intelligence Toolkit identifies which buildings have been inspected and gives officials a fast way to react to problems. http://coolmaps.esri.com/NYC/BIT3/

The Legionnaires cluster was located in buildings without a cooling tower, but they were connected and shared a hot water supply. Three became infected and one died over the course of a year. The simple fact that a common variable was identified in these separate cases over this long period of time is thanks to the City being prepared and having data at the ready.

Emergency Drills are for Data too

I’m confident that New York City will be able to contain this cluster so it doesn’t lead to an outbreak like what we had three years ago, but during the next emergency, invariably, we will find we need access and answers to something we don’t know we need. This is what I call the unknown unknown – data we don’t even know we don’t possess.

How can we possibly collect data on everything we may possibly need? That’s where data drills come in handy. When faced with an emergency, we come upon challenges and significant data gaps. As a result, we become aware of needs we could have never predicted prior to that crisis and can fill those demands before such emergencies get completely out of hand.

Data drills are a concept that started in New York City. They are developed and conducted based on a specific operational challenge involving data and require multi-organizational cooperation to achieve a desired result. They can be designed for individual scenarios such as a Legionnaires outbreak or capacity building, asking questions like, “Do we have data on cooling towers and plumbing city-wide?” Data drills can also be used for operations development as well as software testing.

Overall, data drills are a mechanism for helping a city to baseline citywide data practices. They’re also a mechanism for guiding a city towards improving the ability to identify, understand and use data to solve a city challenge when requested in real time.

Data drills make a city smarter about the information it holds and that is key to using data and analytics to make a city safer, smarter, healthier, more efficient, resilient, sustainable, and equitable. Regardless of whether or not urban analytics are immediately necessary to remediate a situation, for any city, data drills should be considered phase zero — constantly running in the background at a cadence that keeps the city’s data ready to be put into action.

Amen Ra Mashariki is a fellow at the Beeck Center and Global Director of the Data Lab at the World Resources Institute. Follow him at @AMashariki.

May 27, 2020 | By Amen Ra Mashariki

In September 2015, I was sitting in the NYC Office of Emergency Management’s (NYCEM) famed “war room”. It was packed. Literally standing room only. Yet somehow the steady influx of important looking people into the room continued. Was the crisis an impending storm, or a blackout? Neither. This was a “Tabletop,” a simulated emergency situation. In this exercise, the Commissioners of most NYC agencies and their senior staff, some state personnel, and private sector entities (i.e. gas/electric utilities) gathered to review and discuss the actions they would take in a particular emergency, testing their emergency plan in an informal, low-stress environment. This made it easy for everyone to calmly rehearse their roles, ask questions, and troubleshoot problem areas.

people sitting around a conference table with projection of data behind them
NYC Emergency Management “tabletop” exercise. Photo by Amen Ra Mashariki

After 9/11, 2012’s Hurricane Sandy and the Legionnaires outbreak, we knew very well that it’s the unknown unknowns that hurt you the most. This is when I along with a few colleagues created data drills. Data drills help a city baseline where they are with citywide data practices. They also help improve a city’s ability to identify, understand, and use data to solve a challenge when requested. Data drills help a municipal data team move faster and better, but it’s also a very important tool to understand exactly where the holes and problems in your data operations are.

Why was I, the Chief Analytics Officer and Director of the Mayor’s Office of Data Analytics (MODA) at a tabletop emergency management event? To understand, rewind to the beginning of summer 2015 and the outbreak of Legionnaires disease in the city.

Legionnaires—a type of pneumonia—was spreading in the Bronx and Manhattan through contaminated water in cooling towers sitting on top of buildings. My office was brought in to build a machine learning model to help find where every building with a cooling tower existed, and to count and track the registration and ultimately the cleaning of those towers. This was a citywide emergency effort, and because MODA played a key role in its successful conclusion, from that point on it was clear to city leadership that collecting data was key to emergency response efforts. Hence the invitation to the tabletop.

As I watched these agency heads work out their emergency response muscle so they could improve, I realized my former office, as well as the data teams or personnel in other agencies, should find a way to get better at finding, accessing, integrating, and sharing data during an emergency. Agency data leaders needed our own tabletop exercise, because when we weren’t thinking about using analytics to solve a particular problem, we needed to be thinking about data all the time.

We understood that there is data that we know we have, data that we know we don’t have, and data we know absolutely nothing about, including even the fact that it exists (Donald Rumsfeld’s famous “unknown unknowns”).


Related Links


In general, data drills are developed and conducted based on some operational challenge that involves data and will require multi-organizational cooperation to achieve a desired result. Drills can be designed for (but not limited to):

  • Specific scenario: hurricane flood zone, homeless counts, data center disruption
  • Capacity building: collecting data, learning how to operationalize a specific dataset
  • Operations development: down trees clean-up operations between two agencies
  • Testing Software: testing new features in a data sharing platform

Data drills help us take on that challenge by having organizations across the city surfacing, sharing and integrating data. A drill takes place with specified start and end times, forcing all participants to work within real life time constraints. Every data drill results in overall citywide-data I.Q. growing ever so slightly.

Data drill deliverables should be defined early in the planning phase. They may include (but not limited to):

  • Identification of data sets with metadata and data dictionaries
  • Organization-specific operational workflow relevant to data and use-case
  • Interagency workflow for operations, analysis and/or network infrastructure
  • List of organization contacts, roles and responsibilities
  • Documentation of activities and observations
  • Report with recommendations

NYC’s first interagency emergency data drill was conducted by MODA with assistance from NYCEM’s GIS and Training & Exercises divisions, and the 1st Deputy Mayor’s office, from October 14 – 16, 2015. It included an initial data call, assignments for agencies, and an in-person concluding session. Fourteen agencies were participants in the drill, and there were over 60 individual participants.

The scenario for drill play was an extended power outage in an area of downtown Brooklyn affecting 97,000 residents. Eleven agencies contributed data sets to test data sharing mechanisms and MODA’s data integration effort. Immediately after the completion of the drill, a post data drill review showed the drill successfully tested the capabilities it was designed to test.

Capability to TestGoalResult
Points of Contact (POCs)It is important to define and rely on data POCs for responding agencies so when an emergency happens you already know who to reach out to for data.In sending out invitations to the drill, we used a list of data POCs from the various agencies we had developed over the previous two months. In the course of the drill, additional POCs were identified.
Data Call
There must be an existing mechanism in place to convene data leaders across agencies so when a crisis breaks out, communication across data leaders can happen instantaneously.We successfully conducted an initial data call on the 14th and simulated a second data call “in-person” on the 16th.
Data Sharing Mechanism
The need to be able to swiftly share data during an emergency with metadata templates for tracking is key to executing a successful emergency response. A data sharing mechanism was successfully used by all participants.
Data Integration
Data integration is one of the most complex things to do in general, but when you attach this need to the timeliness and precision requirements that must be met during an emergency, data integration without planning, process and skill is almost impossible. MODA successfully integrated data from a large sample of the data sets provided by the agencies, even when given a few hours to complete the needed integration.
Reporting Metrics
Leaders within the city, first responders and other stakeholders during a crisis require immediate, accurate and consistent reporting during an emergency.MODA led the effort to work with agencies to propose draft reporting metrics. The 1st Deputy Mayor’s office reviewed and commented on the draft metrics.

A key takeaway from this blog is that we built the concept of data drills in NYC up from a simple idea to a very complex, citywide, highly impactful undertaking. This wasn’t just because it was a good idea. Good ideas come a dime a dozen.  This was an idea that every agency in NYC government felt was overdue. This was something that we all knew needed to happen. Therefore, high participation and ultimately impact was inevitable. For every city, domestic or international, data drills should be a key part of their data strategy. These drills should constantly be running in the background at a cadence that keeps the city’s data ready to be put into action. Data drills make the city smarter about their data, and that is key to being able to use data and analytics to make a city safer, smarter, healthier, more efficient, resilient, sustainable, and equitable.

COMING SOON: Executing a Data Drill


Stay connected to the Beeck Center

Sign up for our newsletter and get regular updates on what’s happening at the Center, news about our portfolio interests, social impact job opportunities and more!

sign up now button