FaceSpeaker with earcons as person identifiers

Preliminary introduction.

This project will be carried out by Tim in 't Veld and Octavian Nasui as a small project in the "sound and music technology" master course at Utrecht University. The (preliminary) research question is:
"In a face recognition device for the blind, is it possible and useful to convey the identity of a person to the user using earcons rather than synthetic speech?"


We recently developed FaceSpeaker, a prototype wearable face recognition device for the blind. The prototype speaks an acquaintance's name to the user when that acquaintance comes in view of the user's camera glasses.

As explained in the FaceSpeaker project report, one of the key requirements the FaceSpeaker design must meet is "economizing on the user's attention". The user uses this device in contexts involving social interactions. In such contexts the user is expected to focus near-exclusive attention on the social interactions and may be somewhat overloaded with sensory inputs. This is especially true for visually impaired users. They mostly rely on their ears, not just for following the conversation but also for catching social cues and cues on their environment.

The FaceSpeaker prototype conveys the identity of an acquaintance to the user by issuing a beep and speaking the acquaintance's name. We concluded that this was a poor design choice, and suggested to replace the beep by vibrotactile feedback in future prototypes. But conveying a person's name through synthetic speech may not be optimal either. The key problem is that understanding the synthetic voice may be hard in contexts involving a lot of background noise, and understanding the synthetic voice puts strain on the brain's language capabilities which may be quite overloaded as it is in a crowded social setting.

Hence the idea of conveying a person's identity by some other sort of sound. Many games feature sounds which convey an approaching character to the player (you might hear a lot of growling just before the terrible monster appears), and such cues can be effective. It is an open question whether sound cues can effectively replace names in FaceSpeaker, and if so, what type of sound (music / sound effect / ...) is most effective and socially acceptable. To our knowledge, this question has not been researched before.