99% Invisible - 535- Craptions
Episode Date: May 3, 2023Bad closed captions can be entertaining, but they can be serious, too, because captions are a critical tool for lots of lots of people. There are the people learning a new language and of course cap...tions are essential for people who are deaf or hard of hearing. In the US, that’s about 15% of the adult population.Craptions
Transcript
Discussion (0)
This is 99% invisible. I'm Roman Mars.
Last fall, the BBC made a huge announcement. They had picked a new lead actor for their
beloved TV show, Doctor Who.
Hello, I'm Shuty Gatswa, and I am the next doctor in the next season of Doctor Who.
If you need some context, a new lead actor on Doctor Who is like a new pope being selected.
But for nerds!
That's producer Chris Baroube.
This announcement should have been a triumphant moment.
After all, the actor Chuty Gatwa was the first black performer cast as the doctor.
But when this video came out, the announcement was quickly overshadowed by something that
should have been totally mundane.
The closed captions.
Gat was first name spelled NCUTI, but the closed captions on YouTube replaced his name
with a swear word.
So while the actor said, hello, I'm Shiti Gatsova.
The captions auto corrected his name to, hello, I'm Shiti Gatsov.
That was unfortunate, but it was far from an isolated incident.
When you start looking for closed caption failures, you see them pretty much everywhere.
I turned on closed captions for a week on broadcast TV, YouTube, pretty much anywhere I was watching
stuff, and I noticed so many errors. The fictional country and Black Panther, Wakanda, it became War, Canada.
Bill Murray and Dan Ackroyd were suddenly part of an elite group called The Ghost Busses.
It went on and on.
There's actually an academic high-minded term to describe this phenomenon.
Crapchens are captions that are crappy.
That's Linda Bessner.
She wrote about captions for the Atlantic magazine,
and she says there are many, many ways
a caption can be crappy.
They are not accurate.
They don't differentiate between speakers,
so it's just one long run on sentence.
It's not grammatically correct.
It doesn't contain punctuation. So they are sort of halfway there captions
Look bad closed captions can be really funny
But captions are a critical tool for lots and lots of people
There's people learning a new language
People like me who for the life of me cannot understand the Irish accents on dairy girls.
And of course, captions are essential for people who are deaf and hard of hearing.
In the United States, that's about 15% of the adult population.
And with so many captions out there, those folks are often left trying to piece together
meaning from some bizarre sentences or to just stop watching.
The issue of crappy captions isn't a new thing.
Activists have been fighting for accurate widespread closed captioning for decades, and
it's not clear when, if ever, they will become a reality.
Bad captions aren't just annoying.
They're supposed to be illegal, at least for television.
In America, the FCC is responsible for making sure the
airwaves are accessible for everybody. And since 1996, they've required all TV shows to include
captions that are, quote, accurate, synchronous, complete, and properly placed.
Broadcasters in the United States also have to comply with guidelines under the Americans with Disabilities Act, which was passed in 1990.
The ADA says all businesses that serve the public or places a public accommodation must be
equally available to everyone, which means broadcast CV needs to be accessible for the
deaf and hard of hearing.
Taken together, the FCC and ADA rules were pretty clear.
For TV, you need to caption.
And you need to caption well.
And for a while, those rules were enough, until the rise of streaming TV.
Howard Rosenbloom is the CEO of the National Association of the Deaf, or the NAD.
Howard himself is deaf, and we interviewed him over Zoom
with the assistance of an ASL interpreter.
Throughout this story,
you'll hear her voice representing his answers.
Here he comes.
There we go, he says, hello, Chris.
Howard points to 2007
as the first time the NAD got involved in captions for streaming
when Netflix launched its on-demand service.
At the time, if you recall, it was the era of DVDs in the postal mail, right?
Ironically, there was no law which required DVDs to be captioned. But I would say roughly
But I would say roughly 85 to 90% of them were. So many of the members of NAD were Netflix subscribers.
We'd get our DVDs in the postal mail and it was all good.
Netflix decided then to pivot to streaming.
However, streaming at the time, there was almost none.
Maybe 10% of them were captioned.
Howard and the NAD approached Netflix about this issue, but at the time, Netflix argued
the rules didn't apply to them.
They weren't television, so they weren't under the jurisdiction of the FCC.
And Netflix argued the ADA didn't apply to them either.
Here's writer Linda Besner again.
Title III of the ADA says that that in places of public accommodation,
deaf and hard of hearing people must be accommodated,
so they should have an experience equal to that of a person
who is not deaf or hard of hearing.
Netflix's argument was, we are not subject to the ADA
because we're not a place.
They were like, well, we are not an amusement park
or like, you know, some kind of public forum.
We're not a physical location,
so we don't have to do this.
The National Association of the Deaf argued that,
okay, sure, Netflix is not a physical place,
but isn't it kind of a social place?
You know, what about a family watching a Netflix show together?
Like, this is a social experience.
Why should there be a member of that family
who is excluded from that content?
In 2011, the NAD sued Netflix in a Massachusetts court
that judge agreed that the ADA applies to Netflix
and issued a consent decree saying Netflix had to provide
captions for all of its streaming content within four years.
Today, the full Netflix library is captioned.
They have teams of captioners working on every show, and some of their captions are really
good, and some of them are only okay, but according to Howard Rosenblum of the NAD, Netflix
has done a fairly good job.
Once that case was done and dusted, we began to approach
other companies like Hulu, Amazon. So all three of them agreed to the terms of a phased
in captioning approach. Since the Netflix case, the National Association of the Deaf has been
playing a game of whack-a-mole with the internet. Basically, a new streaming thing
will crop up, and the NAD will helpfully remind them about their legal requirements to caption stuff.
The NAD has worked with companies like Zoom and Microsoft Teams to encourage live captions for
online meetings, and in 2015, the NAD sued Harvard and MIT for not providing captions for their online courses.
Most recently, the NAD sued serious exam
for not providing transcripts on their podcasts.
Nearly all podcasts are recorded, right?
They're not live.
There's no excuse for not providing a transcript.
Full disclosure here, 99PI is owned by a serious exam,
but I'll note that our show does
provide transcripts to all of our episodes. While there's been a lot of progress on the issue of
streaming captions, there's one internet galaeth the NAD won't go after. It's actually a place used
by 80% of Americans, where people watch billions of hours of content every day. I'm talking about YouTube.
Linda Bezner spoke to YouTube about this problem a couple of years ago.
And back then, they made it clear.
They had no plans to require captions for all of their videos.
Here's Linda.
They did say, for example, for things like sort of live citizen journalism.
Like the spokesperson I spoke with gave the example of Arab Spring.
You know, if it's somebody who is, you know, posting video of some event that is occurring
around them, are you really going to flag for removal that video because it doesn't have
transcriptions? Even though YouTube isn't legally required to provide closed captions,
they have a lot of deaf and hard of hearing users. In 2009, YouTube tried addressing the transcriptions.
The automatic captions are created with Google's universal speech model, which uses speech recognition technology and AI to identify words,
and then put them together into coherent sentences.
Predictive text is something that, you know, you may be familiar with from autocorrect
on your phone, that it's like you started a sentence, and then it wants to fill in your
blank.
Much like maybe your mother does.
And much like your mother, sometimes they don't know what you are going to say, and
you have to be like, wow, that's not like, I'm not, I'm actually trying to make the opposite point right now.
The early days of automatic captions were pretty rough.
And Ricky pointer remembers them well.
Ricky is a deaf YouTuber.
And in 2009, she was watching a lot of beauty videos using the brand new automatic captions.
I remember I was watching a video about concealers and the automatic captions I had them on and
they were saying zebra, the mammal, the animal, whatever, in place of the word concealers,
which was so bizarre.
Ricky started her YouTube career making videos about makeup and beauty products, but she
wanted to do something different.
So in 2014, Ricky produced a video called Things You Shouldn't Say to People Who Are
Deaf or Heart of Hearing, where she lamppoons the clueless questions she was always being
asked by hearing folks.
It's deafness contagious.
I don't know, but come over here.
I'll call for you and we'll find out tomorrow. Why don't you know
sign language? You must not really beat it. Well, last time I
checked, deafness has to do with whether or not your
eardwork and not the languages that you know. A couple of
Ricky's videos focused on the issue of closed captioning. And in
one video, Ricky made a call to arms, asking her fellow YouTubers to ditch automatic
captions and write their own captions instead, for better yet, to hire a professional service
to make the captions for them.
The campaign for human closed captions went viral, and Ricky caught the attention of some
of YouTube's biggest stars.
I want to make my little community here on YouTube
more inclusive, and one of the ways that I have decided
to do that in the new year is by close captioning
every single one of my videos.
Yeah, Tyler agreed me to video.
While I was on a plane on the way to LA, actually,
and it was like titled, Hear Me Out or something,
and I was in the thumbnail, and he was talking about
the importance of captions, and I had no idea.
Until I had landed in a letter on my phone and I was getting so many notifications.
Today, Oakley's video about closed captions has been watched more than 1.8 million times.
And after that, other big-time YouTubers weighed in,
like Lily Singh.
But Ricky Poyter wanted to push the campaign
into a higher gear.
So in 2016, she launched a hashtag called No More Craptions.
The term had been floating around the internet for a while,
but Ricky helped bring it to the mainstream.
Lots of YouTubers, both deaf and hearing, made
videos pledging to include human-produced captions in their videos.
Close captions help the deaf in the heart of hearing.
Hello. Today's video is about the No More Crapchens campaign.
Why isn't this video closed captioned when it's labeled closed captioned? And I was freaking
out because there's some big beauty creators that had my announcements that they were
gonna start captioning and then it was start captioning you know a couple of
videos at a time. Things were getting better human captions were showing up on more
and more videos until some YouTubers just stopped doing it.
Ricky says it was disappointing to see many YouTubers stop captioning their videos.
There's no mystery as to why this happened.
Do it yourself captions can be time consuming. Meanwhile, hiring a professional to make your closed captions can be expensive.
Professional captioners on the whole, they aren't paid very well. But for a YouTuber, dropping
40 or $50 to caption a video, can add up pretty quickly. Look, captioning videos at a professional level is hard work.
Oh, I got headaches. Yeah. And like, yeah, my eyes, I mean, my vision is not great anyway,
but it was, it was pretty bad for all that stuff.
My friend Emma Healy started work as a close captioner in Toronto in 2017.
The company that I was working for had been subcontracted out by an Australian broadcasting
company, so I was captioning with Australian broadcasting rules.
For this job, Emma had to watch a lot of crappy TV.
A big one that people would complain about was Crocomoli, which was an Australian children's
television show about a crocodile that loved guacamole named
Crocamole.
I gotta be honest, our interview devolved into watching Crocamole for like half an hour.
It's not the most lyric we invented.
This is the first thought best thought. His eyes are really sad.
Even though Emma was captioning some pretty simple TV shows, the work took a very long time.
Emma says she would do about 40 minutes of captioning in an eight hour shift. And that was on a
good day. The place that I was working there was, I believe, a 75-page manual full of rules.
All very technical and you had to memorize basically all of them.
Emma showed me the manual, which outlines the rules for how captions are supposed to look
on Australian broadcast TV.
And it made professional captioning seem impossibly Byzantine.
Here we go. Minimum duration is one second.
No maximum, but seven to eight seconds is typical.
Do not leave captions on screen overly long for no reason
if the dialogue has ended.
There must be a one second blank caption.
It goes on like this.
For 75 pages.
Emma had to keep all of this stuff in mind
for what was ultimately a minimum wage job.
Professional captioning is complicated because it's not just about getting the words right. You also
have to nail the punctuation and the placement of the caption and lots of other small details. Here's
Linda Besner. So the gold standard for closed captioning would be something where, you know,
for example you and I are talking.
When I am talking, it might say Linda Colin.
And then there would be a capital letter
to indicate that I am beginning a sentence.
And then it would faithfully represent
what I have said with question marks
where I have asked a question and commas,
where commas go.
When there is, one of the cats makes a noise in the background. There would be square
brackets cat meowing. If a violent windstorm swept over and you could hear, you know, raindrops coming
against the window, it would have bracket raindrops sweep against window pain. So this kind of caption
really gives you the feeling that you are, you are privy to anything in
the auditory environment of that video.
YouTube creators could either hire a professional captioner, which is expensive, or do this work
themselves which can take a long time for a beginner. So a lot of them just don't bother.
Since Ricky's campaign, there have been signs of progress. According to YouTube, in the past few years, there's actually been an increase in the
number of human captions submitted by creators.
From 2020 to 2021, the number of videos with manual captions on YouTube went up by 30%.
But despite this uptick, let's think about the bigger picture here.
Automatic captions are still by far the most common type of caption on the platform.
To this day, YouTube has no plans to require human-generated captions on their videos.
And the NAD has said it has no plans to sue YouTube.
Legally speaking, Howard Rosenblum doesn't think the NAD can make an argument that YouTube
is covered by the ADA.
Also, Howard believes it would be incredibly difficult
to caption everything on the site.
For YouTube, we do work closely, for example,
with Google who owns YouTube.
And we've talked through this issue.
And there are two separate problems at play,
one of which is the working retroactively
on the millions of minutes of content that is already
uploaded to YouTube. Stagger's the imagination. The second issue is the huge amount of new content
that's uploaded every day. I don't remember the number of hours of videos posted to YouTube
every minute, but it's mind boggling.
YouTube has experimented with ways to encourage more human captioning. In 2018,
they introduced a program called Community Captions where volunteers could submit captions
for popular videos. Many deaf YouTubers like the program, but it was discontinued in 2020,
because many of the volunteer captioners were submitting jokes or promotional
material instead of accurate captions.
When I reached out to YouTube, they pointed to a couple of new programs they're rolling
out to help people with captioning.
One allows users to give subtitle editors permission to fix up their captions.
Another allows you to submit corrections if you see in an accurate caption on certain
videos. But ultimately, it feels like YouTube is putting all of its chips on AI to fill in the
caption gap. Just let the automatic captions continue improving until they get closer to total accuracy.
To be fair, the automatic captions have gotten a lot better. YouTube would not share accuracy numbers for this story, but according to the media hub
at the University of Minnesota at Duluth, YouTube auto captions are somewhere around 70 percent
accurate.
That's a big improvement from the early days.
Ricky Pointer says she's noticed the difference.
AI captions still don't include contextual things like the rain is falling softly in the
background against the window pane. But the AI has become very good at identifying dialogue. I were to say, you know, dentist recommend daily fossing to help remove decay causing material from between teeth
and undergums.
The automatic captions would be able to write that out wordwise.
Usually, really pretty well.
There are still some major problems.
The auto captions haven't figured out.
Grammar remains an issue.
And the AI is pretty good with words in English,
but it really struggles with other languages,
especially when it comes to names and places. This is a pretty common complaint words in English, but it really struggles with other languages, especially when it comes to names and places.
This is a pretty common complaint about AI programs, the way they can adopt certain biases.
And there's another issue with the captions produced by AI.
Many users find them hard to read.
They move in a more like kinetic, I think that's the term I learned like a week ago. Kinetic way where it sounds like words are coming like this.
They show up one word at a time.
And that's not the most accessible way to follow.
Still, despite the challenges and the frequent wonkiness, automatic captions just keep getting better.
And they will keep improving.
I mean, have you been following
the terrifying progress of AI recently?
There's an AI program now, good enough
to pass the bar exam.
So how hard are closed captions?
Howard Rosenblum thinks the AI captions
will compete with human captioners
a lot faster than we think.
Five years ago, I would have said,
A, no way.
There is no way ASR is going to get anywhere.
But now, like, it's shocking how much better it's gotten.
In 5 years, who even knows?
I suspect it will be better.
Maybe it will be less than 5 years.
To me, it feels very possible the automatic captions will get close to 100% accuracy, at
least for speech.
But even if we can get there, the fact is we aren't there yet, and deaf and hard-of-hearing
people have real accessibility needs right now.
People like Ricky Pointer.
I mean, what I love for automatic captions to be 100% absolutely, but I just, it's hard to know if that's ever gonna be possible.
Maybe in the year 3000.
I don't wanna wait years and years and years and years
for that to happen.
Ricky doesn't go on YouTube much anymore.
Not because of accessibility, it's because, frankly, she feels like she's
growing out of it. A lot of her friends have left. It doesn't feel like such a special community
these days. Still, once in a while, Ricky checks in on her favorite accounts, and recently, she saw
one of her favorite creators had published her first video after a long hiatus.
Charlie McDonnell, Charlie is so cool like she came back to YouTube. first video after a long hiatus.
Ricky was excited to watch the video, but as always she was dreading the automatic captions.
To Ricky's surprise, the video had human captions.
They weren't totally professional, it looked like the YouTube star had done it herself.
There were small mistakes, just little things
that might drive some people nuts.
But for Ricky, it was good to feel like somebody
actually cared.
It was fully captioned, mixed case, full words,
punctuation and all, and I was like, that's awesome.
I would love to see more of that. ["Soulin' back with Chris Barouba, Hey, Chris.
Hey, Roman.
So do you use closed captioning?
Because I use it all the time.
For everything, just all the time.
Yeah.
Everything I watch on Netflix, everything I watch on
everywhere at this point that it's available.
Yeah.
So why do you use them?
Well, I use them because I get confused easily.
I don't use them for accessibility reasons,
but I'm certainly not the only hearing person using closed captions. There's this one recent
survey. It was more than 1200 people, which found half of Americans are using closed captions
most of the time. And that number is actually much higher for Gen Z. That's like 70%. So it's not
just people who have accessibility needs.
And so what do you think so many people
are using closed captions?
Yeah, there's a lot of reasons.
I mean, one is that captions are just normal now, right?
Like especially for younger viewers.
You see this in a lot of TikTok and Instagram videos
that they have open captions,
which means that the captions are on all the time.
You can't turn them off.
And this kind of an aesthetic choice,
it's just become a thing you see in a lot of videos.
But going back to the survey, the big reason,
there's one big reason that most people cite
for using the closed captions and the big reason
is that audio sounds muddled.
So people are saying, modern dialogue is more confusing,
it's harder to follow, and that's why they like
the captions so much.
Yeah, I have noticed this.
I call this the Christopher Nolan problem because he has a
pension for having mumbly low-key but very good actors in his movies next to large booming
sounds.
Yeah, I mean, I love Tom Hardy, but Tom Hardy, like I don't think I've ever heard him deliver
a line straight, deliver a line intelligibly.
And so why do you think that is? I mean, why is dialogue harder to follow now?
Well, there is actually a journalist
who looked into this recently.
All of my friends were like, I can't understand anything
that anybody's saying.
And I was like, man, I really wish
that there is an answer to this.
So that's Edward Vega.
He is a video producer for the website Vox.
And he decided to look into this.
Actually, the catalyst for him was watching the movie movie The King of Staten Island with Pete Davidson.
Okay.
It's the emotional height of the movie and he says like, it's hard.
I think it'll always be hard.
I think it'll always be hard.
But he mumbles it and he says it so quietly that like, I swear to God, I went back three times.
So Ed did this big investigation and he discovered like, this is real, this is not in our heads.
And there's a couple of reasons this is happening.
Okay, what are the reasons?
So a big one is dynamic range. So basically, if your movie has voices in it and your movie also has
explosions, you have to make sure the explosions are louder than the voices, right?
Like, I mean, that seems obvious.
But it's something where the explosions have to be quite a bit louder so that they have
an impact.
When you're mixing a movie, if your explosion is going to be the loudest thing in the mix,
then your dialogue can't be that loud. And so you have to choose which
one do you want to move? Do you want to move the dialogue down or do you want to move the
explosion up?
One of these issues, it is this idea of dynamic range. It's balancing things out so that
the explosion has more impact as a result, the dialogue suffers. So another big issue
is the rise of naturalism in performances.
If an actor mumbles their way through a line, if their words start to run together,
there's nothing I can do to fix that. So this is Austin Olivia Kendrick.
I currently work as a dialogue editor at Warner Brothers. Now, I'm
wondering about his television specifically. So Edward Vega very nicely introduced
the two of us and Austin says movie technology has changed and as a result actors performances have changed as well.
Back in the day when when sound technology was first introduced to film that technology was very kind of primitive.
The only real option when it came to filming movies with sound was you were on a sound stage, a
microphone was planted above your head and the actors had to stand there and
project into the microphone. But as sound technology evolved and all the sudden
we have wireless microphones that we can place on actors, that kind of loosened
up actors performances. They now no longer have to stand
in one spot and project and gave them more freedom and kind of pushed them more into
a naturalistic style of performance. And that subsequently could mean mumbling.
So there you go, technology, the second thing, that's why you're getting these mumbly performances.
Yeah, but for decades and decades, I know that they've been doing dialogue replacement
where an actor comes in after the fact and records dialogue to make it more clear.
Right, yeah, that's ADR.
And I did talk to Austin about this.
And she says doing ADR is actually pretty difficult, especially when the dialogue is very
mumbly.
When we bring actors in to re-record lines, they have to match their
performance to the way that their lips are moving on screen.
And if they mumbled their way through that take, their lips aren't moving very
much. So if they come in and re-record a line with a lot more
diction and a lot more separation between the words, that's not going to match.
The last issue, and this might be the biggest one, this is something called down mixing.
So this is the idea that a lot of movies and some TV shows are mixed for the best possible sound system, right?
Many of them are mixed for Dolby Atmos. That's the special sound system.
You get in certain theaters that has 128 channels.
So you can mix something in a very, very specific particular way.
And then after you've mixed it for the best sound system,
you have to remix it for other platforms like a television or a phone.
And that is when you start to get into trouble.
You're taking 128 channels and you're compressing that down into different formats with lesser channels.
And most of the time when you're watching something out of your TV,
unless you have a surround system, it's going to be stereo or mono.
So if you're going from 128 down to two or down to one, it gets muddy. It gets a lot more year.
So lots of movies and TV shows, you know,
they're getting mixed in the sound system,
then they're getting down mixed for other formats.
And Austin says, usually the down mix at the end,
that's the last thing they do in the editing process.
And sometimes they don't put quite the same care
and effort into it that they would
for the initial mix for a movie theater.
Yeah, a lot of the times when it comes to that kind of down mixing, often studios do not
want to spend the money because ultimately time is money and re-recording mixers are very
expensive to hire for that extra time, you know. So oftentimes they want to put a little
money into that area as possible. So, Roman, there you have it.
It's down mixing, it's naturalistic performances, it's this dynamic range issue.
Those are some of the reasons why you did not understand the movie Tenet by Christopher
Moore.
Well, they're not the only reason why I didn't understand Tenet, but that helps.
Yeah, the time travel stuff also contributes to it, I guess.
There's a lot going on working against Tenet
being a comprehensible film.
So, Roman, I have to say before we go,
the video by Edward Vega on Vox,
it goes into so much more depth on this topic.
It's wonderful.
I hope everybody checks it out.
And also, I hope everyone checks out Austin Livia Kendrick's TikTok,
if you use that platform.
She explains movie sound and it is so fascinating.
It's awesome. All right, thank you so much, Chris.
Thanks, Roman.
99% Invisible was produced this week by Chris Barube,
with editorial input from Linda Besner.
She originally wrote about this topic for the Atlantic magazine.
Linda is writing a book about people and their complicated relationships with money,
look for it soon, at a bookstore near you.
The story was edited by Kelly Prime, original music by Swan Riel, sound mix by Martin
Gonzalez, fact checking by Graham Haysha, who couldn't independently verify that Crocomoli
loved Guacamole, but we assured him that this was true
and we went with it anyway.
Kurt Colstet is our digital director,
our intern is a Vante Nomear.
The rest of the team includes senior editor Delaney Hall,
Christopher Johnson, Jason D'Leone,
Emmett Fitzgerald, Vivian Leigh,
Jacob Moltenado Medina, Joe Rosenberg,
and me Roman Mars.
The 99% of his below-go was created by Stefan Lawrence.
Special thanks this week to Caroline, Minks, and Ann Valentine.
We are part of the Stitcher and Sirius XM podcast family, now headquartered six blocks north
in the Pandora building.
In beautiful.
Uptown.
Oakland, California.
You can find the show and join discussions about the show on Facebook.
You can tweet at me at Roman Mars and the show at 99PiOrg.
Or on Instagram, Reddit, and TikTok too.
You can find links to other Stitcher shows I love,
as well as every past episode of 99PI at 99p i dot org.
you