1 Cognitive Systems, Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Informatics and Mathematical Modeling, Technical University of Denmark3 Copenhagen Center for Health Technology, Center, Technical University of Denmark
In this thesis, a number of possible solutions to source separation are suggested. Although they differ significantly in shape and intent, they share a heavy reliance on prior domain knowledge. Most of the developed algorithms are intended for speech applications, and hence, structural features of speech have been incorporated. Single-channel separation of speech is a particularly challenging signal processing task, where the purpose is to extract a number of speech signals from a single observed mixture. I present a few methods to obtain separation, which rely on the sparsity and structure of speech in a time-frequency representation. My own contributions are based on learning dictionaries for each speaker separately and subsequently applying a concatenation of these dictionaries to separate a mixture. Sparse decompositions required for the decomposition are computed using nonnegative matrix factorization as well as basis pursuit. In my work on the multi-channel problem, I have focused on convolutive mixtures, which is the appropriate model in acoustic setups. We have been successful in incorporating a harmonic speech model into a greater probabilistic formulation. Furthermore, we have presented several learning schemes for the parameters of such models, more specifically, the expectation-maximization (EM) algorithm and stochastic and Newton-type gradient optimization.