YouTube Bans Vaccine Misinformation: How AI Moderates What You See

by Lily Adelstein

September 29, 2021 3 min read

Today, YouTube announced that it will ban any content calling common vaccines, approved by health authorities, "ineffective or dangerous.” This move builds off of an earlier ban of similar claims about coronavirus vaccines.  After recognizing that fears regarding other vaccines were contributing to hesitancy around the coronavirus vaccine, YouTube expanded the earlier ban to include all anti-vaccine content, not just coronavirus related. But what does it mean, behind the scenes, when a company decides to ban certain content? How does a platform with as much content as YouTube go about effectively moderating that much content? Machine learning is often central to the solution.

In the past 9 months, debate around content moderation has reached a fever pitch with the removal of President Trump’s social media accounts and heightened concern around voter fraud. When few individuals spread misinformation, complex technologies like machine learning are not needed to identify high profile accounts, like Trump's. But in the case of coronavirus misinformation and false vaccine claims, machine learning is needed to manage the truly expansive reach of social media platforms.

Much of online content moderation is already happening with artificial intelligence and machine learning. YouTube reports that 98% of the videos they removed for violent extremism are flagged by machine-learning algorithms. Aside from the far greater scale and speed of machine learning versus human moderation, this also can help minimize the harmful consequences on human moderators tasked with regulating the daily deluge of horrific content uploaded online. In 2020, the Verge reported that, “YouTube Moderators are Being Forced to Sign a Statement Acknowledging the Job Can Give Them PTSD” demonstrating that the harmful effects of human content moderation are well known. Technologies like machine learning help moderate false and harmful content often reducing the negative effects imposed on human content moderators.

Two common practices for identifying harmful content include matching and classification. Scholars from the University of Oxford explain matching as follows:

Systems for matching content typically involve ‘hashing’ which is the process of transforming a known example of a piece of content into a ‘hash’ — a string of data meant to uniquely identify the underlying content. You compare any given hash against a table of existing hashes to see if it matches any of them.

In a 2018 study, MIT scholars found that, “false news stories are 70 percent more likely to be retweeted than true stories are.” The prevalence of re-sharing an original piece of content makes it fairly easy to match to the original content if it is problematic. But it becomes difficult when, for example, in 2015 a gunman live-streamed a mass shooting; the original file was immediately taken down but soon after, hundreds of thousands of versions of the video were re-uploaded to Facebook and YouTube. Hashing is now used to prevent this kind of mass dissemination of violent content.

The second practice for identifying harmful content is classification, where machine learning models, trained on labelled data, can classify and predict whether a new piece of content falls under a prohibited category. One example is if a platform prohibits images with weapons in them. A machine learning model, trained on a database of images with weapons, would be able to identify other images with weapons and classify them appropriately.

Over 720,000 hours of video are uploaded every day to YouTube. This amount of data would require an unrealistic number of humans to moderate, so companies with this scale will continue to require the use of tools like machine learning to ensure these policies are carried out.

Want products news and updates?

Sign up for our newsletter to stay up to date.