Artificial Intelligence models may have a few issues, algorithms don’t
Not all the concerns about AI models are unfounded. But most of the problem lies with the human element in the entire process: the selection of training and testing data.
As machine learning — fashionably branded as artificial intelligence (AI) — continues to flourish, a veritable cottage industry of activists has accused it of reflecting and perpetuating pretty much everything that ails the world: racial inequity, sexism, financial exploitation, big-business connivance, you name it. To be fair, new technologies must be questioned, probed, and “problematized” (to use one of their favourite buzzwords) — and it is indeed a democratic prerogative. That said, there seems to be persistent confusion around the very basics of the discipline.
No other example demonstrates this best than the conflation of objectives, algorithms and models. Simplifying a little, the life cycle in creating a machine learning model from scratch is the following. The first step is to set a high-level practical objective: What the model is supposed to do, such as recognising images or speech. This objective is then translated into a mathematical problem amenable to computing. This computational problem, in turn, is solved using one or more machine learning algorithms: specific mathematical procedures that perform numerical tasks in efficient ways. Up to this stage, no data is involved. The algorithms, by themselves, do not contain any.
The machine learning algorithms are then “trained” on a data sample selected at human discretion from a data pool. In simple terms, this means that the sample data is fed into the algorithms to obtain patterns. Whether these patterns are useful or not (or, often, whether they have predictive value) is verified using “testing” data — a data set different from the training sample, though selected from the same data pool. A machine learning model is born: The algorithm, along with the training and testing data sets, which meets the set practical objective. The model is then let loose on the world. (In a few cases, as the model interacts with this much larger universe of data, it fine-tunes itself and evolves; the model’s interaction with users helps it expand its training data set.) From predictive financial analytics to more glamorous cat-recognising systems, most current AI models follow this life cycle.
To reiterate, the algorithms themselves do not contain data; the model does. Algorithms are simply mathematical recipes and, as such, go way before computers. When you are dividing two numbers by the long division method, you are implementing an algorithm. Simpler still, when you are adding two, you are also implementing another. A commonly used algorithm to classify images — Support Vector Machines — is a simple way to solve a geometrical problem, invented in the early 1960s. Despite the bombastic moniker, it is not a machine, merely a recipe. Another with an equally impressive name, the Perceptron, also has a dry mathematical statement despite sounding like something out of a science fiction film.
All of the above would have sounded like idle pedantry had prominent voices not continued to conflate models with algorithms. Last month, America’s latest cause célèbre, Congresswoman Alexandria Ocasio-Cortez, noted that “algorithms are still pegged to basic human assumptions”. Unless you count basic logic as one such impediment, no other assumptions hide behind an algorithm. Yet another American professor published a book titled “Algorithms of Oppression.” While all of this may be for rhetorical effect — and algorithms as shorthand for artificial intelligence whatchamacallit — it reveals a cavalier attitude towards notions, especially among those who are in positions to shape technology policy.
This is not to say that concerns about AI models are unfounded. But most of the problem lies with the human element in the entire process: the selection of training and testing data. Suppose a developer draws on historical incarceration data to build a model to predict criminal behaviour. Chances are likely that the results will appear skewed and reflect human biases. Similarly, when Amazon’s voice responsive speaker Alexa told a user to “kill your foster parents”, it was pointed out that Reddit (not the politest of chat platforms) was part of its training set. Finally, as a recent MIT Technology Review article put it, the conversion of a practical objective into a computational problem (again, a human activity) may also introduce biases into an AI model. As an example, the article asked, how does one operationalise a fair definition of “creditworthiness” for an algorithm to understand and process?
At the end, the issue is not whether AI systems are problematic in themselves. It is that we are, as we choose data and definitions to feed into algorithms. In that, technology is often a mirror we hold in front of ourselves. But algorithms are independent of our predilections, built, as they are, only out of logic.
Abhijnan Rej is a New Delhi-based security analyst and mathematical scientist
The views expressed are personal