FaceSpeaker user documentation

This page describes the FaceSpeaker software and explains how to use it. To download and install FaceSpeaker, please refer to the download page.

Access keys

Every focusable component of the FaceSpeaker user interface has an access key (alt +a letter or symbol). This allows the user to quickly navigate the user interface without using a mouse. If the access key is pressed, focus moves to the element. In case the target element is a button the button is immediately activated and focus does not move; pressing a quick access key for a checkbox moves focus to the checkbox and toggles its state. Throughout this documentation, access keys will be listed in brackets next to each element (e.g. "trained persons list [alt + p]"). A table listing all focusable elements and their access keys can be found at the bottom of this page.

Important notes

FaceSpeaker can only recognize people optimally if at least 2 persons are in the database; by default FaceSpeaker comes trained with the author's face. On some laptops, power management software may throttle CPU speed when the machine is running on batteries to conserve power. FaceSpeaker is computationally intensive so it may not work optimally if the CPU speed is throttled. Therefore, please ensure your power scheme does not inhibit CPU performance (the easiest way to do that is to change your power scheme to "maximum performance"). FaceSpeaker does not utilize the GPU (graphics chip).

Starting FaceSpeaker

After downloading and installing FaceSpeaker, start it using the shortcut placed in the start menu's programs or apps section. The FaceSpeaker window will appear, and FaceSpeaker will say "loading data...". Once the database of trained persons has been successfully loaded, FaceSpeaker will say "initializing camera and face engine...". Once everything has been loaded and initialized, FaceSpeaker will issue a sound and say "starting capture!". The time FaceSpeaker needs to initialize depends on the number of persons in the trained persons database and the computer's performance characteristics. During the initialization process FaceSpeaker will issue a sound every 3 seconds to let the user know it is still initializing. Once FaceSpeaker is loaded, the grayscale video feed captured by the camera will be displayed in the top right part of the FaceSpeaker window.

Adding a person

To add a person to the trained persons database, enter the name of that person in the "name" textbox [alt + n] and click the "add person" button [alt + a]. FaceSpeaker will say "start training!".

Make sure the person to be added is in front of the camera, and make sure no 2 faces are in the camera's field of vision. If FaceSpeaker detects a face in front of the camera, it will say "face found!" and begin capturing faces after a short delay. Every time a face has been added, FaceSpeaker issues a click sound. If FaceSpeaker does not detect a face in front of the camera, it will issue a sound and say "waiting..." at 3 second intervals until a face is detected in front of the camera.

After 20 faces have been added, FaceSpeaker will say "Done! Retraining face engine." When the face engine has been retrained, FaceSpeaker will issue a sound and say "starting capture!".

Recognizing persons

After FaceSpeaker has said "starting capture" and the person just added is in front of the camera, FaceSpeaker should recognize that person. When a person is recognized, FaceSpeaker first issues a short beep. It then speaks the name of the person detected. Assuming the detected person stays in front of the camera, FaceSpeaker issues a short beep after approximately one second. The pitch of that beep indicates the recognition confidence: the higher pitched the beep, the more likely it is that the person has been correctly recognized. If the second beep is not issued, it indicates the recognized person is no longer detected in front of the camera; this often suggests the recognition is not reliable. If FaceSpeaker detects an unknown face in front of the camera, it issues a long low pitched beep. sound. If FaceSpeaker detects there is no longer a face in front of the camera, it issues a long low beep to indicate this to the user.

Managing trained persons

The "Trained persons" listbox [alt + p] contains the names of all persons in the trained persons database. You can select a person and perform various actions for that person.

Adding training data

Usually, one training session is not sufficient for FaceSpeaker to reliably recognize a person under lighting conditions different from the lighting conditions at training time. If a previously trained person is misidentified or recognized as unknown, training data must be added for that person. To add training data, select the person in the trained persons listbox [alt + p] and click the "add training data" button [alt + t]. FaceSpeaker will capture and add faces in the same manner as described in the "adding a person" section, except that in a retraining session only those faces which are incorrectly recognized based on existing training data will be added. This way, the retraining procedure ensures the trained persons database will contain a broad variety of face images acquired in different situations while the database does not get filled up with a lot of duplicate images. The retraining procedure continues until 20 faces have been added or 5 faces have been captured and correctly recognized using existing training data (which suggests further retraining under current conditions is not useful).

Deleting a person

A person can be deleted by selecting his name in the "Trained persons" listbox [alt + p] and pressing the "delete person..." button [alt + d]. FaceSpeaker will display a dialogue asking the user to confirm or cancel the deletion. Press [enter] to permanently delete the selected person or press [Escape] to cancel this action.

View person details and trained face images

The bottom left part of the FaceSpeaker window displays information about the user selected in the "Trained persons" listbox [alt + p]. The read-only "Selected person details" multiline text box [alt + s] lists the person's name and the number of trained faces saved to the trained persons database. Below this text box is the "trained face images" group box. This group box allows the user to visually review the faces stored for a person. Press the "<" button [alt + < meaning alt + shift + comma] to view the previous image or the ">" button [alt + > meaning alt + shift + dot] to move to the next image. In future FaceSpeaker versions this functionality will be expanded and the user will be able to delete bad face images manually.

Power saver

The power saver function limits the frame rate to about 4 frames per second if no face is detected for 2 seconds or if a person stays in front of the camera after having been finally identified. This lowers CPU utilization in order to decrease power consumption and heat production. It should not influence recognition accuracy but may make it more difficult to get the camera pointed at a face. The power saver function is on by default. It can be turned off by unchecking the "Power saver" checkbox [alt + o].

Frame rate

The "Frames per second" read only textbox [alt + f] displays a constantly updated approximation of the camera's frame rate. FaceSpeaker automatically adapts the frame rate to the computer's performance, so if power saving mode is switched off the frame rate gives a rough indication of the computer's performance. The maximum possible frame rate depends on the camera and is usually 25 or 30 frames per second.

Getting help

The "FaceSpeaker online help..." button [alt + h] opens this documentation. The "program version"read only textbox [alt + v] displays the program version. If you ask for support make sure to include the program version in your message.

Recognition accuracy

For best results, use FaceSpeaker in good lighting conditions. There should be enough light and it should be uniformly distributed. Also ensure good camera quality and positioning. Good recognition requires "frontal face images". This means persons should look straight at the camera and the camera should not be tilted relative to the face. People wearing glasses can often be normally recognized, but sunglasses interfere with face recognition.

The video feet acquired by the camera is being displayed in the top right part of the FaceSpeaker window, so a sighted individual can easily check if the images captured by the camera have good quality characteristics. The requirements described in the previous paragraph are not absolute, FaceSpeaker is designed to work under suboptimal conditions encountered in the "real world" when using a body mounted camera. However, keeping the guidelines in mind will help the user get better results and facilitate a fair judgment of FaceSpeaker's performance. Note that training FaceSpeaker to recognize a person in extremely bad conditions (a person wearing sunglasses, an extremely dark environment where the face is hardly discernible on the camera feet etcetera) will often be possible, but can influence recognition results under better conditions. Notably, FaceSpeaker may recognize other (unknown) persons as the person for whom training data was added under unfavorable conditions. Deleting and retraining that person under better conditions should resolve the issue.

Components and their access keys

The below table lists all focusable components of the FaceSpeaker user interface along with their access keys.

Component Access key
Name text box alt + n
Add person button alt + a
Trained persons listbox alt + p
Add training data button alt + a
Delete person... button alt + d
Previous image button alt + < meaning alt + shift + comma
Next image button alt + > meaning alt + shift + dot
Speech message textbox alt + m
Approximate frames per second textbox alt + f
Power saver checkbox alt + o
FaceSpeaker online help... button alt + h