Hiding in Plain Sight: A Call for a Code of Ethics in Data Usage

“I have nothing to hide” is a tired justification that we can no longer use when it comes to our data privacy. I think we will find too late that the importance of privacy has nothing to do with compromising information on an individual level and everything to do with the information and power we have collectively given away.

June 4, 2018 | By Lara Fishbane, Research Assistant

On April 30, Anna Lauren Hoffmann published an article on Medium that outlines example after example of how the data collected on us is inconspicuously damaging any efforts we’re making toward a better, more equal society. Even as we tell women that they can be engineers, architects, lawyers, entrepreneurs, and presidents, Google Translate is subtly suggesting otherwise. As we march the streets demanding that it be understood that Black Lives Matter, Facebook is systematically failing to identify black men and women as people.

The dangers Hoffman points to are frightening and real. And the imperative for action in an increasingly hyperconnected and technology-enabled world is critical. How do we build a world where our algorithms are attentive to social consequences? And, perhaps even more important, how do we reach a world where data and technology solve for social inequality?

We can say again and again that what we need is more diversity in tech—and it’s true, we definitely do—but the problems in our algorithms will persist. Even a perfectly diverse group of engineers would inevitably be constrained by algorithms that learn from a world of biased outcomes. In other words, the problem remains that algorithms rely on data collected in a society of systemic oppression, bias, and inequity. Even a “neutral” algorithm cannot escape that fact.

Perhaps then the call is to eliminate categories such as race, gender, sexual preference, etc., from any automated decision-making processes. However, even without explicit groupings in our data, we still run the danger of perpetuating biases. For example, imagine a hiring algorithm employed by a company looking to fill a vacancy. Even if the algorithm is blind to names, addresses, race, and gender, it’s possible that the algorithm picks up details that correlate with these categories. A person may have attended a high school in a predominantly non-white neighborhood, may use adjectives or syntaxes that are native to certain cultural backgrounds, or have participated in affinity groups that correlate with social groups. Though these algorithms don’t consider race or gender, they will ultimately reproduce the same biases that already corrupt our hiring processes, while operating under the guise of neutrality.

And so, regardless of how little you think you have to hide, the collection of your data is dangerous. It allows companies to form the types of correlations that actively threaten social progress. Even if the data isn’t being used to make discriminatory hiring, loan, or credit decisions, the potential for harm is no less real. Think of Facebook, for example, who is using your data to improve your ad experience. Masked by the false pretense of an improved experience, the consequences of something so seemingly benign are worth being considered. Are we okay with a society in which men are more often shown advertisements for high-paying jobs than women? Or one in which Google searches for black-sounding names are associated with criminality? What about one in which low-income consumers are inundated with gambling-related advertisements? In aggregate, it’s impossible to say that these decisions about how our data is collected, stored, sold, and used don’t matter.

Europe’s General Data Protection Regulation attempts to solve for some of these concerns around privacy and automated decisions. For example, Articles 13, 15, and 22 grant users the right to an explanation of how their personal data is being used to arrive at decisions. Recital 71 grants them the power to challenge that decision. The development of these articles is not insignificant and represents part of a larger conversation around taking “back” (did we ever have it?) control of our data and the usage of it. But it is likely to be limited — source code is too esoteric for lay people to understand, and a lay explanation might miss the complexity of how the algorithm is actually functioning. Further, even if the outputs are understood to be unfair, it seems unduly burdensome to shift the responsibility of challenging the decision to the end user. Marginalized and oppressed groups already often bear the brunt of needing to redress the injustices enacted upon them.

What’s really needed—and I am certainly not the first to suggest this—is a code of ethics around data and how it gets used in algorithms. Such a code should be underpinned by values of equity and fairness, and reflect the world we want to live in. Moreover, perhaps counterintuitively, it should be something that is less-well defined rather than more. A vagueness pushes companies to strive for better whereas hard lines are something to be reached and not exceeded. Further, there should be trusted third parties whose job it is to vet these algorithms and represent the rights of the end user.

This premise is not without its own challenges. Namely, the practicality of developing a code of ethics that adequately represents the rights of people, not those with vested interests; the creation of a new and fair marketplace for vetting and authenticating code; and the protection against the creation of perverse and dangerous incentives that may develop when third parties are paid to be the arbiters of fairness.

However, these are not sufficient reasons for inaction. Imperfect solutions that strive for something better through thoughtful and collaborative design are better than just letting our systems continue as is. It is our responsibility to not leave this one unresolved.