ADA Compliance: Media Captions in Education Space

You are about to read an easy understandable guide on how educational institutions can work with video captioning to comply with the American Disabilities Act.

The number of universities being sued because of failure to comply with the regulations is exploding and includes some of the finest educational institutions in the world: Harvard, UC Berkeley, and hundreds more.

UC Berkeley announced:

Starting March 15, the university will begin removing more than 20,000 video and audio lectures from public view as a result of a Justice Department accessibility order.

As a company operating in the education space this is sad news. Removing information and knowledge from the public because it’s easier than becoming compliant. However, it’s understandable that it is the chosen strategy since captioning has, up until now, been extremely costly and complex.

Let’s look at the available options for the university decision makers.

Option 1 - Create Captions in-house‍

Pros: control of the process and accuracy.
Cons: hiring and onboarding, project management, extremely costly.

Assuming you have a significant volume of data it’s likely you will have to hire additional people to run with the captioning process.

You will need to find a large amount of relatively skilled people with the patience to transcribe the video data with timestamps.

Hiring and onboarding is a challenge in itself both in terms of allocating budgets and also time consuming.

You could be looking at 6–12 months before you have transcribed and captioned the first video. Additionally, where are all the new hires going when the captioning project has finished?

Option 2 - Outsource Captioning to Humans

Pros: 100% accuracy.
Cons: very costly, no automation.

You will find hundreds of companies specialised in transcribing and time stamping video data done the good ol’ way:

Transcriptionist listening to the audio while typing it out.

Even though most companies advertise a price point of ~$1-2 per minute it usually adds up to more than that.

The following will add to the price:

Faster turnaround time
‘Difficult’ audio
Number of speakers

Outsourcing to humans ensures 99% accuracy but with thousands of hours of video data it becomes an expensive solution.

Assuming UC Berkeley’s video content mentioned above was 60 minutes on average at $2 a minute it would have cost them $2.4M to close caption the content for compliance.

Additionally, when working with manual transcription companies there is no way to automate the captioning process.

Option 3 - Automated Captioning

Pros: inexpensive and fast solution.
Cons: lower accuracy.

Google, Amazon and Apple are pouring billions of dollars into developing more accurate speech recognition. Because of this, automated transcription and captioning has come a long way in the last few years.

Using automated transcription platforms you can bulk upload all the video content and download captions a few minutes after.

It’s extremely fast, fully automated and will usually cost between $0.05 - $0.20 a minute depending on volume. Using same math as above it would cost UC Berkeley $120k instead of $2.4M to caption all the content.

What is the catch, then?

Most automated speech recognition companies boast a +90% accuracy, which is accurate when transcribing clear audio with little background noise.

Difficult audio is nowhere near 90% accurate and you will have to hire people to check and edit the captions before it can be published. Most automated captioning services offer an online editor to make this process of fixing the captions easier.

If your audio is relatively clear with no background noise and only a few speakers then we would recommend you give this option a try.

Option 4 - AI and Human Based Captioning‍

Pros: affordable, fast and 100% accurate solution.

Recently, a number of companies, including us at Konch, have started utilising a combination of machines and humans to deliver the best of both worlds.

You can upload all your video data and decide if you want a human to double check the machine output. If the audio is noisy, unclear or below a certain accuracy threshold, it makes sense to have it double checked by a human.

However, if the audio is clear and above a certain accuracy threshold then there is no need to pass it through a human pipeline, which makes is much more affordable.

Having been in this industry for many years, this is the solution we recommend. If you want our help to do it contact our Director of Education, Anders Hasselstrøm, at anders@konch.ai.

Conclusion

It is understandable that most educational institutions have been having a hard time allocating resources to ADA compliance and captioning video content.

Up until now it has been an expensive and cumbersome process.

However, with the latest advancements in technology the landscape has changed. You can now use a combination of technology and humans to create transcripts and closed captions fast and relatively inexpensive.

‍