Next week professional services firm Accenture will be launching a new tool to help its customers identify and fix unfair bias in AI algorithms. The idea is to catch discrimination before it gets baked into models and can cause human damage at scale.
The “AI fairness tool”, as it’s being described, is one piece of a wider package the consultancy firm has recently started offering its customers around transparency and ethics for machine learning deployments — while still pushing businesses to adopt and deploy AI. (So the intent, at least, can be summed up as: ‘Move fast and don’t break things’. Or, in very condensed corporate-speak: “Agile ethics”.)
“Most of last year was spent… understanding this realm of ethics and AI and really educating ourselves, and I feel that 2018 has really become the year of doing — the year of moving beyond virtue signaling. And moving into actual creation and development,” says Rumman Chowdhury, Accenture’s responsible AI lead — who joined the company when the role was created, in January 2017.
“For many of us, especially those of us who are in this space all the time, we’re tired of just talking about it — we want to start building and solving problems, and that’s really what inspired this fairness tool.”
Chowdhury says Accenture is defining fairness for this purpose as “equal outcomes for different people”.
“There is no such thing as a perfect algorithm,” she says. “We know that models will be wrong sometimes. We consider it unfair if there are different degrees of wrongness… for different people, based on characteristics that should not influence the outcomes.”
She envisages the tool having wide application and utility across different industries and markets, suggesting early adopters are likely those in the most heavily regulated industries — such as financial services and healthcare, where “AI can have a lot of potential but has a very large human impact”.
“We’re seeing increasing focus on algorithmic bias, fairness. Just this past week we’ve had Singapore announce an AI ethics board. Korea announce an AI ethics board. In the US we already have industry creating different groups — such as The Partnership on AI. Google just released their ethical guidelines… So I think industry leaders, as well as non-tech companies, are looking for guidance. They are looking for standards and protocols and something to adhere to because they want to know that they are safe in creating products.
“It’s not an easy task to think about these things. Not every organization or company has the resources to. So how might we better enable that to happen? Through good legislation, through enabling trust, communication. And also through developing these kinds of tools to help the process along.”
The tool — which uses statistical methods to assess AI models — is focused on one type of AI bias problem that’s “quantifiable and measurable”. Specifically it’s intended to help companies assess the data sets they feed to AI models to identify biases related to sensitive variables and course correct for them, as it’s also able to adjust models to equalize the impact.
To boil it down further, the tool examines the “data influence” of sensitive variables (age, gender, race etc) on other variables in a model — measuring how much of a correlation the variables have with each other to see whether they are skewing the model and its outcomes.
It can then remove the impact of sensitive variables — leaving only the residual impact say, for example, that ‘likelihood to own a home’ would have on a model output, instead of the output being derived from age and likelihood to own a home, and therefore risking decisions being biased against certain age groups.
“There’s two parts to having sensitive variables like age, race, gender, ethnicity etc motivating or driving your outcomes. So the first part of our tool helps you identify which variables in your dataset that are potentially sensitive are influencing other variables,” she explains. “It’s not as easy as saying: Don’t include age in your algorithm and it’s fine. Because age is very highly correlated with things like number of children you have, or likelihood to be married. Things like that. So we need to remove the impact that the sensitive variable has on other variables which we’re considering to be not sensitive and necessary for developing a good algorithm.”
Chowdhury cites an example in the US, where algorithms used to determine parole outcomes were less likely to be wrong for white men than for black men. “That was unfair,” she says. “People were denied parole, who should have been granted parole — and it happened more often for black people than for white people. And that’s the kind of fairness we’re looking at. We want to make sure that everybody has equal opportunity.”
However, a quirk of AI algorithms is that when models are corrected for unfair bias there can be a reduction in their accuracy. So the tool also calculates the accuracy of any trade-off to show whether improving the model’s fairness will make it less accurate and to what extent.
Users get a before and after visualization of any bias corrections. And can essentially choose to set their own ‘ethical bar’ based on fairness vs accuracy — using a toggle bar on the platform — assuming they are comfortable compromising the former for the latter (and, indeed, comfortable with any associated legal risk if they actively select for an obviously unfair tradeoff).
In Europe, for example, there are rules that place an obligation on data processors to prevent errors, bias and discrimination in automated decisions. They can also be required to give individuals information about the logic of an automated decision that effects them. So actively choosing a decision model that’s patently unfair would invite a lot of legal risk.
While Chowdhury concedes there is an accuracy cost to correcting bias in an AI model, she says trade-offs can “vary wildly”. “It can be that your model is incredibly unfair and to correct it to be a lot more fair is not going to impact your model that much… maybe by 1% or 2% [accuracy]. So it’s not that big of a deal. And then in other cases you may see a wider shift in model accuracy.”
She says it’s also possible the tool might raise substantial questions for users over the appropriateness of an entire data-set — essentially showing them that a data-set is “simply inadequate for your needs”.
“If you see a huge shift in your model accuracy that probably means there’s something wrong in your data. And you might need to actually go back and look at your data,” she says. “So while this tool does help with corrections it is part of this larger process — where you may actually have to go back and get new data, get different data. What this tool does is able to highlight that necessity in a way that’s easy to understand.
“Previously people didn’t have that ability to visualize and understand that their data may actually not be adequate for what they’re trying to solve for.”
She adds: “This may have been data that you’ve been using for quite some time. And it may actually cause people to re-examine their data, how it’s shaped, how societal influences influence outcomes. That’s kind of the beauty of artificial intelligence as a sort of subjective observer of humanity.”
While tech giants may have developed their own internal tools for assessing the neutrality of their AI algorithms — Facebook has one called Fairness Flow, for example — Chowdhury argues that most non-tech companies will not be able to develop their own similarly sophisticated tools for assessing algorithmic bias.
Which is where Accenture is hoping to step in with a support service — and one that also embeds ethical frameworks and toolkits into the product development lifecycle, so R&D remains as agile as possible.
“One of the questions that I’m always faced with is how do we integrate ethical behavior in way that aligns with rapid innovation. So every company is really adopting this idea of agile innovation and development, etc. People are talking a lot about three to six month iterative processes. So I can’t come in with an ethical process that takes three months to do. So part of one of my constraints is how do I create something that’s easy to integrate into this innovation lifecycle.”
One specific draw back is that currently the tool has not been verified working across different types of AI models. Chowdhury says it’s principally been tested on models that use classification to group people for the purposes of building AI models, so it may not be suitable for other types. (Though she says their next step will be to test it for “other kinds of commonly used models”.)
More generally, she says the challenge is that many companies are hoping for a magic “push button” tech fix-all for algorithmic bias. Which of course simply does not — and will not — exist.
“If anything there’s almost an overeagerness in the market for a technical solution to all their problems… and this is not the case where tech will fix everything,” she warns. “Tech can definitely help but part of this is having people understand that this is an informational tool, it will help you, but it’s not going to solve all your problems for you.”
The tool was co-prototyped with the help of a data study group at the UK’s Alan Turing Institute, using publicly available data-sets.
During prototyping, when the researchers were using a German data-set relating to credit risk scores, Chowdhury says the team realized that nationality was influencing a lot of other variables. And for credit risk outcomes they found decisions were more likely to be wrong for non-German nationals.
They then used the tool to equalize the outcome and found it didn’t have a significant impact on the model’s accuracy. “So at the end of it you have a model that is just as accurate as the previous models were in determining whether or not somebody is a credit risk. But we were confident in knowing that one’s nationality did not have undue influence over that outcome.”
A paper about the prototyping of the tool will be made publicly available later this year, she adds.