Lost in AI transcription: adult words creep into kids’ YouTube videos

HOW “beach” becomes “bitch”, “buster” turns into “bastard” or “combo” turns into “condom”?

This happens when Google Speech-To-Text and Amazon Transcribe, two popular automatic speech recognition (ASR) systems, mistakenly give age-inappropriate captions on YouTube videos for children.

🗞️ Subscribe now: get Express Premium to access the best election reports and analysis 🗞️

This is the main finding of a study called “Beach to bitch: Inadvertent Unsafe Transcription of Kids Content on YouTube” which covered 7,013 videos from 24 YouTube channels.

Ten percent of those videos contained at least one “highly inappropriate taboo word” for kids, says US-based Ashique KhudaBukhsh, assistant professor in the department of software engineering at the Rochester Institute of Technology.

KhudaBukhsh, Assistant Professor Sumeet Kumar of the Indian School of Business in Hyderabad and Krithika Ramesh of Manipal University, who conducted the study, called the phenomenon “a hallucination of inappropriate content”.

“We were stunned because we knew these channels were watched by millions of children. We understand this is an important issue as it tells us that inappropriate content may not be present in the source but may be introduced by a downstream AI (artificial intelligence) application. So on a broader philosophical level, people usually have checks and balances for the source, but now we need to be more vigilant about the checks and balances if an AI application modifies the source. It can inadvertently introduce inappropriate content,” KhudaBukhsh, who holds a PhD in machine learning and hails from Kalyani in West Bengal, told The Sunday Express.

According to the study, hallucinations of inappropriate content were found on channels with millions of views and subscribers, including Sesame Street, Ryan’s World, Barbie, Moonbug Kid and Fun Kids Planet.

Closed captions on YouTube videos are generated by Google Speech-To-Text while Amazon Transcribe is a leading commercial ASR system. Creators can use Amazon Transcribe to embed captions in their videos and import them into YouTube when uploading the file.

The study was presented and accepted at the Association for the Advancement of Artificial Intelligence’s 36th annual conference in Vancouver in February.

“These models tell us that whenever you have a machine language model trying to predict something, the predictions are influenced by the type of data it’s trained on. Most likely, it’s possible that they haven’t trained enough examples of child or baby speech in the data they are trained on,” KhudaBukhsh said.

The study points out that most English subtitles are disabled on the YouTube Kids app, but the same videos can be watched with subtitles on YouTube.

“It’s unclear how often kids are only confined to the YouTube Kids app when watching videos and how often parents (or guardians) just let them watch YouTube kids’ content in general. Our findings point to the need for closer integration between mainstream YouTube and YouTube for Kids to be more vigilant about child safety,” the study said.

When asked about the accuracy of its automatic captions, a YouTube spokesperson said in a statement, “YouTube Kids provides engaging and entertaining content for children and is our recommended experience for children under 13. Automatic captions aren’t available on YouTube Kids, however, our captioning tools on our main YouTube site help channels reach large audiences and improve accessibility for everyone on YouTube. continually improving automatic captions and reducing errors.”

Another example of a misinterpreted word in one of the popular videos goes like this: “You should also find porn.” The actual dialogue ended with “corn”.

KhudaBukhsh said these errors could be due to data fed to ASR systems during training. “Seeing ‘I like porn’ is a more likely sentence than ‘I like corn’ when two adults are having a conversation. Perhaps one reason some of these adult words slip into the transcript is that RSOs are trained more on examples of speech from adults,” he said.

KhudaBukhsh said introducing a human element into the transcription process could be one of the ways to prevent these inappropriate words from being broadcast to millions of young viewers. “We can have a human in the loop to check for transcription errors. We can have someone watch and manually confirm whether it’s there in the video or not,” he said.

This is not the first time that KhudaBukhsh has pointed out the fallibility of AI systems. Last year, he and a student conducted a six-week experiment that showed that words like “black”, “white” and “attack” – common to those who comment on chess – could possibly fool a system of IA by flagging some chess conversations as racist. . This was shortly after Agadmator, a popular YouTube chess channel with over a million subscribers, was blocked for not following the “Community Guidelines” during a television broadcast of ‘chess.

KhudaBukhsh, who conducted the research at Carnegie Mellon University in Pittsburgh, said the results were an eye-opener on the possible pitfalls of social media companies that rely solely on AI to identify and shut down sources of hate speech. .

Comments are closed.