Recent studies on computational speech segregation reported improved speech intelligibility in noise when estimating and applying an ideal binary mask with supervised learning algorithms. However, an important requirement for such systems in technical applications is their robustness to acoustic conditions not considered during training. This study demonstrates that the spectro-temporal noise variations that occur during training and testing determine the achievable segregation performance. In particular, such variations strongly affect the identification of acoustical features in the system associated with perceptual attributes in speech segregation. The results could help establish a framework for a systematic evaluation of future segregation systems.
Journal of the Acoustical Society of America, 2014, Vol 136, Issue 6, p. 398-404