We present a monaural approach to speech segregation that estimates the ideal binary mask (IBM) by combining amplitude modulation spectrogram (AMS) features, pitch-based features and speech presence probability (SPP) features derived from noise statistics. To maintain a high mask estimation accuracy in the presence of various background noises, the system employs environment-specific segregation models and automatically selects the appropriate model for a given input signal. Furthermore, instead of classifying each timefrequency (T-F) unit independently, the a posteriori probabilities of speech and noise presence are evaluated by considering adjacent TF units. The proposed system achieves high classification accuracy.
2013 Ieee Workshop on Applications of Signal Processing To Audio and Acoustics, 2013