1 Department of Electrical Engineering, Technical University of Denmark2 Hearing Systems, Department of Electrical Engineering, Technical University of Denmark
Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. This study investigated speech intelligibility in conditions of spatial release from masking (SRM) where the masker is moved, on-axis, away from the target. Two binaural models, which use the conventional audio signal-to-noise ratio (SNR) in the decision metric, and two monaural models, using a decision metric based on the SNR in the envelope domain (SNRenv), were considered. The predictions were compared to data from Westermann et al. [2013, POMA, 19, 050156] in condi- tions where the target was located 0.5 m in front of the listener and the masker was presented at a distance of 0.5, 2, 5 or 10 m in front of the listener. The data showed an SRM of 10 dB when moving the masker from a distance of 0.5 m to a distance of 10 m. The long-term monaural model based on the SNRenv metric was able to account for most of the SRM data, whereas the models that used the audio SNR did not predict any SRM, even when they included an equalizationcancellation-like process. The short-term monaural model based on the SNRenv metric predicted a small SRM only in the noise-masker condition. The results suggest that true binaural processing is not always crucial to account for speech intelligibility in spatial conditions and that an SNR metric in the envelope domain appears to be more appropriate in conditions of on-axis spatial speech segregation than the conventional SNR. Additionally, none of the models considered grouping cues, which seem to play an important role in the conditions studied.