This paper addresses the issue of data compression in distributed speech recognition on the basis of a variable frame rate and length analysis method. The method first conducts frame selection by using a posteriori signal-to-noise ratio weighted energy distance to find the right time resolution at the signal level, and then increases the length of the selected frame according to the number of non-selected preceding frames to find the right time-frequency resolution at the frame level. It produces high frame rate and small frame length in rapidly changing regions and low frame rate and large frame length for steady regions. The method is applied to scalable source coding in distributed speech recognition where the target bitrate is met by adjusting the frame rate. Speech recognition results show that the proposed approach outperforms other compression methods in terms of recognition accuracy for noisy speech while achieving higher compression rates.
Network Infrastructure and Digital Content (ic-nidc), 2014 4th Ieee International Conference on, 2014, p. 453-457
Main Research Area:
Ieee International Conference Network Infrastructure and Digital Content Proceedings
The 4th IEEE International Conference on Network Infrastructure and Digital Content, 2014