SoundSight

Designing a better way to visually engage with sounds in video content for Deaf and Hard of Hearing users

Presented at the 2021 Designing for Accessibility Project Fair at Stanford University.

Overview

Whether it's a tender moment in a rom-com movie or a jump scare in a horror film, sound plays an integral role in a viewer's experience. Even beyond the soundtrack of a movie are other details like background music and environmental sounds which provide context and emotion that may go unnoticed to those who can’t hear that information. SoundSight is a visualization engine to create better representations of such sounds in video content for Deaf and Hard of Hearing users.

Developed with Lloyd May.

Problem

While closed captioning is great for encapsulating dialogue in a video, it doesn't capture everything. After interviewing a number of people who are Deaf or Hard of Hearing, we found that semantic and emotive information which informs video content can often get lost in closed captioning. As Christine Sun Kim explains, a mournful violin playing in the background of a scene can often get reduced to something like “[music],” lacking emotional and informative depth ("Artist Christine Sun Kim Rewrites Closed Captions").

"Captioning dialogue is one thing. But captioning sound is another."

— Christine Sun Kim

We wondered — how might we create a better way to capture the musical and emotional experience of watching visual content?

A hand-drawn sketch of a low-fidelity prototype where a movie display has curved lines on the left and right sides of the screen to represent environmental sounds.

Prototyping

Using the insights we gathered from our needfinding interviews, we developed the idea of creating a sound visualizer which maps different sonic parameters to visual parameters. These visualizations would capture sounds like music and environmental noises which aren't typically conveyed by closed captioning and would overlay on the video to supplement (not replace) closed captions.

User-Testing

To better understand which of these variables might prove helpful or prove too distracting, we conducted 2 rounds of prototyping and user-testing with both Deaf and Hard of Hearing users.

Key Design Questions

What are the most important sounds to convey? (music, voice, environmental noise)
What are the most important aspects of the sound to communicate? (rhythm, intensity, frequency, emotion, location — i.e., does the sound originate from an action on the left or right of the screen)
What is the best strategy to communicate those various aspects? (color, thickness, opacity, noise, smoothness, position on screen)
How might we maximize information and aesthetic enjoyment while minimizing distraction?

Methodology

To conduct these user tests, we created 3 different visualizers for 3 different movie clips. Participants were asked to watch each clip and then respond to a few qualitative questions measuring clarity, understanding, aesthetics, and level of distraction for each clip.

Findings

After these rounds of user-testing, we identified the following insights to inform revisions for future prototypes.

One prototype demo displaying a horror movie scene with sharp, jagged lines on the right side of the screen, indicating where a radio sound was coming from.

Spacialized sound seemed appealing. We were able to validate our assumption that placing the visualizer on the side of the screen where the sound originated would add helpful context to the viewing experience. Multiple participants expressed interest in spacialized sound and understood that it was conveying location.

One prototype demo displaying a romantic comedy scene with smooth, wispy lines, indicating the actress's voice.

Voice visualizer was redundant. One of our prototypes included a scene where our protagonist gives a long and emotional speech, so we built separate visualizers for both the background music and for the speaker’s voice. Users found this distinct voice visualizer distracting, as the closed captions were already there. They expressed preference to watching the speaker’s mouth and facial expressions instead.

One prototype demo displaying a combat scene from an action movie scene with grainy, green visualizations on the left and right sides of the screen.

Screen position of the visualizer relative to the closed captioning. Our design originally placed the visualizations on the left and right of the screen; however, some users noted that they would have to look the entire length of the screen East-West “like a lizard” in order to get a sense of what sounds they were conveying.

Future Work

Given the limited scope of our user-testing, next steps include conducting more widely-tested surveys with more controlled and finer-tuned visualization prototypes. We'd also like to explore the option of leaving more room for aesthetic customization with regard to color, thickness, screen position, etc.