October 18, 2021 2 min read
Tremendous advances in machine learning that are boosting profits and consumer experiences are being built on the backs of exploited refugees. Or, at least, that’s the claim made recently by Phil Jones in an excerpt from his forthcoming book, Work Without the Worker: Labour in the Age of Platform Capitalism. “Big tech relies on the victims of economic collapse” is the drop head, and the article paints an ugly picture of tech companies targeting communities with limited job prospects for one of the most expensive aspects of machine learning: data labeling.
In order to build a machine learning model, that model must be trained on labeled data. For example, if you want a machine learning model to be able to help quickly identify a flaw in a pharmaceutical manufacturing line, that machine will first need to see lots of examples of acceptable pills and lots of examples of flawed pills so that it can learn to distinguish between the two. If you want a machine to help you predict how much money you’ll need to spend on a variety of ad types in order to yield a qualified sales lead, the machine will first need to review lots of examples of prior ad spends and lots of examples of qualified and unqualified leads in order to learn how the ad mix impacts the resulting lead.
For these examples and many more, the prerequisite is a universe of data that is properly annotated (or “labeled”) from which the machine can learn. Once the training is complete, ideally the machine now has a model it can use to tackle the real-world problems where the data is not already labeled.
Getting that initial set of labeled data can be extremely laborious, expensive, and time-consuming-- indeed, the difficulty of amassing that neatly labeled data is part of why we are trying to enlist machines into solving these problems in the first place!
But to get to the point where machines take over this type of task, models first rely on humans to provide labeled data. And that, according to Jones, is where refugees and other oppressed people come in. He cites the automated driving industry, for example, which in 2018 had 75% of its data labeled by Venezuelans experiencing some of the most depressed conditions on the planet, following the country’s economic collapse. In Venezuela, Lebanon, and elsewhere, Jones describes people in extreme poverty going to work in awful conditions for very little pay to perform the repetitive tasks of labeling raw data so that it can be used to train models that produce billions of dollars in value.
The growth in machine learning technology gives rise to a whole new set of ethical questions while also highlighting age-old ones. On the one hand, the type of digital microwork required to label mass datasets is undoubtedly safer than many of the potential alternative ways these individuals could otherwise make money. And a society disrupted by war and economic collapse is one where novel opportunities for work are welcome. But there is obviously a line between providing that opportunity- however minimal- to those in desperate need, and exploiting that desperation.
As the demand for machine learning in everyday life grows, undoubtedly the need for troves of labeled data will likewise grow. What kind of cost this will exact- financially and morally- is something we don’t yet know.