Shepstone, Sven Ewan2; Tan, Zheng-Hua2; Jensen, Søren Holdt2
F. Bimbot, C. Cerisara, G. Gravier, L. Lamel, F. Pellegrino, P. Perrier
1 Multimedia Information and Signal Processing, The Faculty of Engineering and Science (ENG), Aalborg University, VBN2 Department of Electronic Systems, The Faculty of Engineering and Science (ENG), Aalborg University, VBN3 The Faculty of Engineering and Science (TECH), Aalborg University, VBN
In this paper we show a new method of using automatic age and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers. Instead of relying on explicitly provided demographic data for each user, we define an audio-based demographic group profile that captures the age and gender for all members of the audience. A 7-class age and gender classifier employing a fusion of acoustic and prosodic features determines the probability of each speaker belonging to each class. The information for all speakers is then combined to form the group profile, which itself is the input to a recommender system. The recommender system finds the content items whose demographics best match the group profile. We tested the effectiveness of the system for several typical home audience configurations. In a survey, users were given a configuration and asked to rate a set of advertisements on how well each advertisement matched the configuration. Unbeknown to the subjects, half of the adverts were recommended using the derived audio demographics and the other half were randomly chosen. The recommended adverts received a significantly higher median rating of 7.75, as opposed to 4.25 for the randomly selected adverts.
14th Annual Conference of the International Speech Communication Association (interspeech 2013): Speech in Life Sciences and Human Societies, 2013, p. 2827-2831
Main Research Area:
Proceedings of the International Conference on Spoken Language Processing
Interspeech 2013Annual Conference of the International Speech Communication Association