The case for appealable algorithms
Welcome to The Perfect MVP. If you’re joining the party for the first time, welcome! We meet here once every two weeks. I write my reflections on how we can build tech to be better for all of us. You can subscribe by clicking the purple button below.
If you wanted to build a software product that was completely untrustworthy and outrageously opaque, how would you proceed?
Imagine this. It’s a slow rambling Tuesday. You’ve had your second coffee this afternoon. You look up at your OKRs for the quarter and you stare at the user story “our users should not trust our product”. How might you build a feature that achieves this goal?
Well, a tried and tested way to lose people’s trust is to repeatedly fail them. You need to fail at the exact things that they expect you to help with. There are a couple ways this could work. You could fail at your core offering - imagine if an app that allows you to store private photos sent those private photos to strangers. (this really happened - smh Google Photos). Or your core product could work perfectly fine most times, but in those very few times when it doesn’t work properly, you provide absolutely no help. Or terrible help. The kind of help where you’re saying a lot of words but not actually moving the needle in a way that benefits the user. Like an airline accidentally overbooking your flight - the last one of the night - forcing you to spend the night at the airport (or worse, a nearby moody motel) and telling you that AbraCadabra regulations stipulate that you can’t get a refund because it’s raining outside. This would undoubtedly piss people off. Ok so now you’ve definitely lost your customers’ trust. Bravo ✅ (1/2).
How can you be opaque? This is probably the easier part. You provide little to no information to users about how your product works, how it fails, why it fails or how often it fails. When users encounter issues with your product, you hold your hands in the air and shift the blame. You could blame someone else - the government, the regulators, or a crowd favourite “human error”. Or you could do the clever thing and blame an inanimate object. Since your product is software, why not blame the most complicated part of your product? The AI. You hold your hands up and say it wasn’t you, but it was the AI algorithms that failed. Of course, in a sense you are right. No human explicitly programmed discrimination into your product. At least I hope not. But if your in-house recruiting tool discriminates against women (see Amazon Automation) and you blame the technology, you’re simply deflecting. You’re facilitating a dynamic in which you are the sole beneficiary when your product works well, but when it fails, your inner Shaggy cries “it wasn’t me”. So you’ve built an opaque product. Congrats 🔥 (2/2)
Why are we talking about trustworthiness and opacity?
Because many products that are algorithm-enabled struggle with these values. In 2021, the year succeeding the departed year of doom, our internet-dependent lives are heavily reliant on algorithms. Algorithms are simply instructions that prescribe a series of steps for solving a particular problem. They can be good solutions or bad solutions. They can involve AI or they can be “dumb”. They can be efficient or so slow that they will never actually finish (at least not in your lifetime). Overall, algorithms make billions of decisions on a daily basis that affect our lives in ways that are both incredibly significant and remarkably unimportant.
Algorithms do mundane things on the internet - if you tell a recipe site that you only have bananas and tomatoes in your pantry, an astute algorithm might tell you to make a banana and tomato sambal for dinner. In reality, the algorithm should probably tell you to skip dinner but that’s neither here nor there. Nonetheless, if this algorithm fails or works in an improper manner, there is limited harm. You may hate your dinner but you will be okay. Nowadays, algorithms have graduated past their internet adolescence and they’re also doing adult things with real consequences. They’re predicting who goes to jail and for how long. They’re diagnosing diseases and suggesting treatments. They’re determining credit limits, predicting who might commit crime based on their faces, choosing which neighborhoods “require” more police presence and which news content is worth recommending. The algorithms have come to the big leagues and the stakes of their decisions are significantly higher.
With such an increased scope of impact, you might naturally assume that software products that rely heavily on algorithms are mature. Mature in the sense that they offer customers a robust and thoughtful customer experience. If you bought a toaster from Best Buy that did nothing when you pressed down on it, you could conceivably return it and get your money back. Better still, you might get the luxury to choose another toaster for free. Such is the way in a capitalistic country where the customer has the power to speak to the manager to demand appropriate treatment. But if an algorithm makes a decision that affects your life in a profound way, but it makes a grave mistake, what can you, the customer do? Who do you call? To whom might you appeal?
Often times, there’s not much you can do.
As the pandemic penetrated through England last year, the government announced that students could not take their A-level exams in person. A-level exams are the tests students in the UK (and Anglophilic world) take that determine where they go to university. The government decided that predicted grades would take the place of actual exam results. But how would they predict student grades? For this, the government turned to an algorithm. So students were awarded grades for exams they never took.
Like a scene from an harrowing horror story, Ofqual’s (the UK government exams body) algorithm recommended that 40% of student grades be downgraded from their teachers’ predictions. Sometimes it’s easy to hide behind percentages. This decision led to roughly 2 million students (GCSE + Alevels) having their results downgraded thereby diminishing their choices for university. Despite what the preachers of meritocracy might tell you, university “pedigree” is still a big deal for most employers today. So the impact of this algorithm could be felt by students post-graduation in their job opportunities and salaries. In short, the consequence of this algorithm’s decision is huge.
A reasonable person like you might ask: "how did Ofqual screw up so badly"?
Well, the first clue is looking at the algorithm’s goal. You see, the UK Secretary of State for Education clearly wrote that the software should aim to ensure that grades follow a similar distribution to previous years. That is, roughly the same ratio of students should get A’s vs get B’s or C’s as did in the previous years. The pursuit of this goal is not necessarily aligned with ensuring fairness to students, although Ofqual claims “the key purpose was to ensure fairness to students within the 2020 cohort”. Whether or not they intended to do this, the outcome suggests that they failed at the fairness objective.
Unsurprisingly, students in England protested and complained about unfair treatment. Ofqual turned around and changed policy - they would use either teacher predictions or the algorithm’s prediction - whichever is higher.
🤓What can we learn from this?
It’s important to note that this situation was unique in some ways. For starters, there was a bloody pandemic ravaging through the planet. Lots of things were going wrong - lack of social contact, way too many Zoom calls, too much doomscrolling and mental health challenges so let’s give Ofqual some grace. They had to make several high-priority decisions in a very limited time. They made a mistake, realized it (albeit only after a public protest and outcry) and then tried to revert the situation. Thereafter, they published the code for the algorithm as well as a 309 page report explaining their approach.
This was the the first time I’ve seen people protest an algorithm. It won’t be the last. As algorithms become more mainstream, we need to take lessons from this chapter. Here are some ideas for what we can learn:
👩🏽⚖️Focus on individual justice by building appealable algorithms
Customers of products that rely on algorithms don’t care that you’re 97% accurate. If they are in the unlucky 3%, there’s no manner of statistical jiu jitsu that will assuage their concerns. Instead, builders of these products should acknowledge the fallibility of their systems and they should treat appeals as a first-class citizen of their app. Builders should expect that users should be able to appeal a decision made by an algorithm and there should be a refined process for what the outcome of an appeal is. In some cases, that might mean a retrial. In other cases, it’s better to get a human to review the information. The nature of your service will influence which approach is more sensible. But you must build the product expecting appeals.
👨🏽🔬FDA for Algorithms
In all 50 states of the United States, you need a professional license to become a hair stylist. And while I agree that ponytails are important art, I find it incredible that *anyone* can create a web app that uses an algorithm to review resumes or grade student essays. I’m sure we all agree that the impact of algorithms given their scale and efficiency is more societally impactful than the hairstyle du jour.
The idea for an FDA for algorithms is not new - the idea is that this body would be politically independent, federally funded and it would scrutinize commercial algorithms. The mandate is one of consumer protection similar to the real FDA. Success could be measured by the number of dangerous algorithms prevented from being released into the wild.
🛠OSHA-like audits for proprietary algorithms
In a previous life, I worked at a chocolate factory. And every so often there was a physical audit of our factory. The regulators would come in to do a rigorous evaluation of our facilities checking to see we were abiding by the expectations from a safety and environmental perspective. Given most tech companies use proprietary algorithms, researchers can not easily study the societal impacts of their algorithms. Moreover, these algorithms are not static entities. Nonetheless, it could be useful to have regular audits of these algorithms to provide a forcing function to keep them within the boundaries of what is considered societally beneficial. If this feels like too paternalistic for you, I would ask you how this is different to other scientific disciplines like biology or chemistry. We define a contract for what is considered acceptable research or commercial practice - where that line is drawn can be subject to debate - but the existence of the line itself should not. Remember that while a misbehaving consumer algorithm might not lead to an oil spill, it might lead to a massacre.
🔬Take transparency seriously
Ofqual did well by publishing the code for the grade predictor algorithm. But code doesn’t tell the full story. There are several places where things could go awry - it could be the way the test data is sourced, cleaned, formatted and validated, it could be the design of the data pipeline itself or it could be the assumptions inherent in the algorithm’s implementation.
It is not immediately clear how Ofqual attempted to resolve the necessary tradeoffs that exist in this. How would the algorithm deal with the bias inherent in historical data? How would it weigh the rates of errors across students who expected A’s vs B’s vs C’s? Does Ofqual consider a false positive to be worse than a false negative in this problem? How even did the algorithm define fairness?
We’ve seen what happens when a medical device company makes a product but doesn’t show its data to the scientific company. Yes, I’m talking about Theranos - a colossal but sad waste of our time. Let’s aim to have zero Theranoses (I guess?) in algorithmic products. This is especially important in cases where customers did not explicitly choose the company - here, students had no choice but to have their grades predicted by this company. So it is imperative that the company is even more transparent because that is how you build trust. If your government forced you to be assessed by an opaque algorithm, it would be understandable if you were concerned.
✨Talk to Tobi
Send me comments, questions, critiques, and thoughts at firstname.lastname@example.org. Or let me know in the comments below!