• EN
  • DA

Danish NationalResearch Database

  • Publications
  • Researchers
Example Finds records
water{} containing the word "water".
water supplies"{}" containing the phrase "water supplies".
author:"Doe, John"author:"{}" containing the phrase "Doe, John" in the author field.
title:IEEEtitle:{} containing the word "IEEE" in the title field.
bech{} containing the word "bech".
marie bech"{}" containing the phrase "marie bech".
orcid:0000-0002-5429-5292orcid:{} Having a particular ORCID
Need more help? Advanced search tutorial
  • Selected (0)
  • History

Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection

    • Save to Mendeley
    • Export to BibTeX
    • Export to RIS
    • Email citation
Authors:
  • Tan, Zheng-Hua ;
    Close
    Orcid logo0000-0001-6856-8928
    Department of Electronic Systems, The Technical Faculty of IT and Design, Aalborg University
  • Lindberg, Børge
    Close
    Orcid logo0000-0001-5394-8100
    Department of Electronic Systems, The Technical Faculty of IT and Design, Aalborg University
DOI:
10.1109/jstsp.2010.2057192
Abstract:
Frame based speech processing inherently assumes a stationary behavior of speech signals in a short period of time. Over a long time, the characteristics of the signals can change significantly and frames are not equally important, underscoring the need for frame selection. In this paper, we present a low-complexity and effective frame selection approach based on a posteriori signal-to-noise ratio (SNR) weighted energy distance: The use of an energy distance, instead of e.g. a standard cepstral distance, makes the approach computationally efficient and enables fine granularity search, and the use of a posteriori SNR weighting emphasizes the reliable regions in noisy speech signals. It is experimentally found that the approach is able to assign a higher frame rate to fast changing events such as consonants, a lower frame rate to steady regions like vowels and no frames to silence, even for very low SNR signals. The resulting variable frame rate analysis method is applied to three speech processing tasks that are essential to natural interaction with intelligent environments. First, it is used for improving speech recognition performance in noisy environments. Secondly, the method is used for scalable source coding schemes in distributed speech recognition where the target bit rate is met by adjusting the frame rate. Thirdly, it is applied to voice activity detection. Very encouraging results are obtained for all three speech processing tasks.
Type:
Journal article
Language:
English
Published in:
Ieee Journal of Selected Topics in Signal Processing, 2010, Vol 4, Issue 5, p. 798-807
Keywords:
Distributed speech recognition; frame selection; voice activity detection; noise-robust speech recognition; variable frame rate
Main Research Area:
Science/technology
Publication Status:
Published
Review type:
Peer Review
Submission year:
2010
Scientific Level:
Scientific
ID:
99446426

Full text access

  • Doi Get publisher edition via DOI resolver
Checking for on-site access...

On-site access

At institution

  • Aalborg university.en
Feedback

Sitemap

  • Search
    • Statistics
    • Tutorial
    • Data
    • FAQ
    • Contact
  • About
    • Institutions
    • Release History
    • Cookies and Personal Data
  • Open Access
    • The Danish Open Access Indicator

Copyright © 1998–2018.

Fivu en