1 Centre for Language Technology, Faculty of Humanities, Københavns Universitet2 Administration, Department of Computer Science, Faculty of Science, Københavns Universitet3 CMU4 USC5 Administration, Department of Computer Science, Faculty of Science, Københavns Universitet
This paper uses a crowd-sourced definition of a speech phenomenon we have called focus. Given sentences, text and speech, in isolation and in context, we asked annotators to identify what we term the focus word. We present their consistency in identifying the focused word, when presented with text or speech stimuli. We then build models to show how well we predict that focus word from lexical (and higher) level features. Also, using spectral and prosodic information, we show the differences in these focus words when spoken with and without context. Finally, we show how we can improve speech synthesis of these utterances given focus information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association: Interspeech 2013, 2013
Main Research Area:
Annual Conference of the International Speech Communication Association, 2013
International Speech Communication Association (ISCA)