This paper addresses the problem of speech segregation by es- timating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distri- bution observed during training. The second method solely relies on a frame-based speech presence probability (SPP) es- timation, and therefore, does not depend on the acoustic con- dition seen during training. We investigate the influence of mismatches between the acoustic conditions used for training and testing on the IBM estimation performance and discuss the advantages of both approaches.
Proceedings of Iwaenc 2014, 2014
Ideal binary mask; Speech segregation; Generalization; Speech presence probability
Main Research Area:
The International Workshop on Acoustic Signal Enhancement, 2014