Will researcher diversity make AI algorithms more just?

And what does it mean for a dataset to be unrepresentative? Unrepresentative of whom?

Hey all, sorry for the late newsletter this week. I was feeling a little under the weather last week. But I’m back now, and ready to rumble. You can expect regular programming from now on 🙌🏽.


Today, I’m responding to an article published in the Doxa newsletter. The author writes about AI ethics from a perspective that isn’t common in the most popular tech media outlets. It is my hope that while we may agree or disagree on the tactics to a better AI ecosystem, we can inform and refine each other’s thinking over time.

In his article, he asks two questions that caught my attention.

  1. Will researcher diversity make the AI algorithms more just?

  2. What does it mean for a dataset to be unrepresentative? And unrepresentative of whom?

I have a simple answer to the first question. Yes. It absolutely will. I know this happens today, and will continue in the future. But you can’t hire a few minority AI researchers and expect your facial recognition technology to become unbiased. It’s deeper than the algorithms. It’s about power, budgets, and organizational will.

The second question sits squarely in the middle of the modern culture wars. On one side, the soldiers vehemently assert that the world today is imperfect and full of injustices and inequalities across all domains of life. It follows from this logic then, that one ought to create datasets that include and represent the experiences of the ignored and trampled upon. In the opposing corner, the defenders are pretty happy with the status quo. They might cede that inequality and injustices exist in some areas, but they do not support the massive upheaval of existing systems. This war is being fought on the battlegrounds of our colleges, in our media and literature, in the way we tell our history and in the apps we permit into our App Stores, This issue is far bigger than tech, and it’s not going away anytime soon.

Let’s dig in.


Will researcher diversity make the AI algorithms more just?

Jon argues:

❗️TLDR (too long, didn’t read)

Incentives are important, but they do not tell the whole story. I mean, BP is incentivized to keep its rigs safe. But the Deepwater Horizon rig blew up killing 11 people and ripped a $65 billion hole in BP’s wallet. Blindspots are real. You can’t see what you can’t see. Power corrupts and people cut corners. Best intentions are often woefully insufficient. Good, sensible ideas like investing in safe rigs and building fair AI often fail because of a lack of organizational will.

Imagine the engineers working on the first Snapchat filter. They’re testing the very first prototype on themselves. You can easily assume that a diverse group of engineers would notice if the duck face filter didn’t work well on non-white skin. A diverse group would likely have different ideas of what’s considered acceptable and what’s not. Sure, there are small but growing numbers of non-white AI researchers at big tech companies today. But show me the budget and I’ll tell you how much these companies truly care.

Which number do you think is higher:

  1. How much an AI company spends on company snacks?

  2. How much they spend specifically trying to make their algorithms accessible and fair?

I don’t presume to know the answer but I won’t be surprised if the numbers are closer than further apart.

📚TSWM (too short, want more)

⛔️ Product failures should be called they are

I reject the framing that changing commercial AI models in support of greater diversity are “social justice changes”. If these companies want to be global, and they want to serve diverse groups of people, their AI models need to cater to the demographics of the world. And if they don’t, their products have failed. It’s as simple as that. It’s like Amazon checkout but it only works for 53 percent of the population. Or a kettle that only works on Tuesdays. That would be ridiculous, no?

So when facial recognition tech fails more on black people - up to ten times more than on white people - that product has fundamentally failed. Let’s call that what it is. It’s not a car with an intermittent flashing “check engine” light. It’s a car with an exploded engine. And what do you do when a product fails? Go back to the drawing board to reassess the requirements.

💯 Mathematical definitions of fairness

There’s active research in algorithmic fairness and algorithmic justice where people are exploring different approaches to make sure algorithms are fairer for all. The first step is identifying that harm could in fact be caused. To make things concrete, researchers have created strict definitions of what we mean when we say “fair”. These definitions vary quite a bit. In fact, one researcher found that there are up to 21 mathematical definitions of fairness. In the book, the Ethical Algorithm, the authors describe how optimizing for one kind of fairness means you get less accuracy or less of another type of fairness. So it’s not immediately obvious that there’s a simple “right solution”.

Given a curve that describes algorithmic accuracy and different kinds of fairness, society will have to choose which point on the curve we want to sit on. That location will vary with application. For some applications, it’s far worse to have a false positive than a false negative. So we might agree, that all protected groups should have equal rates of false positives. In other situations, we might choose otherwise. Like many ethical dilemmas, there is no blanket, context-independent prescription that would make sense in all cases. There’s a similar conversation going on in the biotech world with respect to CRISPR. When (if ever) is it acceptable to edit the germline of a baby? Say a baby has a terrible genetic disease, is it worth trying to cure the baby? Knowing that CRISPR is notorious for off-target edits (basically, genetic mistakes) and germline editing means that the babies’ future lineage will carry these mistakes in perpetuity. Is it worth the attempt? Maybe in the future when the precision is better but I’d argue we’re not there yet.

🌍 Experiences > “brain trust”
The focus on the “AI brain trust” here is a bit misplaced. Biases that creep into products come from many different sources. It’s not just the training data. The algorithm design might be at fault, or the training data could contain historical biases. Or it could be that the product is just a bad idea that will disproportionately lead to more negative outcomes for some groups over others. The work to fix these biases isn’t on anyone single team.

In tech companies today, we already see that underrepresented employees end up doing the unpaid work of highlighting product failures that happen to minorities. Whether it’s Google Photos labeling black people as gorillas (yes, that really happened), or it’s searching “cornrows“ (an African/African-American hairstyle) and only seeing images of white people. People from diverse backgrounds will always be able to highlight these issues far better than homogenous groups. That said, the expectation should never be that minorities have to shoulder the extra weight. Product leaders need to empower their organizations to fix these product failures, or nothing will change.


What does it mean for a dataset to be unrepresentative? And unrepresentative of whom?

Jon argues:

The AI ethics activists don't like that these large AIs are trained on data from the normie world of microaggressions, gendered language, and implicit biases — they're trying to destroy that world, so they'd not see it perpetuated by a globe-spanning, energy-sucking artificial intelligence.

To paint a more concrete picture of what this might look like: the ethics people want to be able to do for the language used by consumer products like Alexa and the Google search box what activist orgs are already doing to great effect for the media, i.e., circulate a guide for what to say and what not to say (like this, or this).

❗️TLDR (too long, didn’t read)

There are complex sociological, political, and moral layers to this question. The part I’m focusing on is this - dear reader, what do you consider the role of technology to be? Is it to reflect the world as it is today in its beauty and imperfections? Or is technology a wand with which we can begin to rid the world of its ills in order to create a more just, more inclusive world?

Imagine you’re collecting data about the history of the US. Some texts will over-index on the Christopher Columbus savior narrative. Some might even suggest indigenous people “just sorta conveniently disappeared”. Meanwhile, texts written and championed by other voices will tell a rather different story. One that is brutal, painful, and is not recommended bedtime reading for children.

Which story do you take? Do you take both? Can you reconcile such opposing data?

If you over-index on the story told by the colonizer, you will miss a lot of useful information. Likewise, if you over-index on the indigenous stories, you’d probably miss a lot too. I’d argue you take both, but unfortunately, we’re getting to a place in the US where we’re struggling to agree on the events of the previous day. So as we feed our machines with data and design algorithms to glean inferences from them, we have to proceed with caution.

📚TSWM (too short, want more)

🤝Siri vs your mother

I think technology should reflect the world as it is while striving to be inclusive and compassionate. To the extent that it’s possible, Siri should not be a micro-aggressor. That said, I can understand why Siri might, for example, use a male pronoun when referring to a lorry driver. If Siri is trained on US employment data, it will learn that most lorry drivers are male. This is an undeniable fact. You can argue whether the gender of a lorry driver is even important but bear with me. Siri should participate in speech like a decent human would. But what’s “decent” in San Francisco circles is very different to what’s decent everywhere else.

If your mother assumed a lorry driver was male. Would that offend or bother you? Are you holding technology to a standard you wouldn’t hold your dearest? Maybe you're holding everything to that standard because you think all gender-based assumptions are wrong and dated. But do you think word bans and circulated guidelines can be effective? Especially when you consider that these products are global. Tell me in the comments!

🌍Internet data is not evenly representative

Many large AI models are trained on Internet texts. These texts are not distributed in a way that’s inclusive of everyone. Even though we might think of the Internet as widely open and accessible - there are groups online that find it harder to have communities online. These include older people, people with disabilities, and people who are at the intersection of multiple marginalized characteristics. Consequently, there are fewer sites of these groups. So there’s less “Internet data” of these groups. And it’s entirely feasible that the data from these groups is overlooked or unprioritized when designing and building AI products.


✨Talk to Tobi

These questions were somewhat spicy. What do you think? Tell me in the comments!

Share