Audio Demonstration of physiologically inspired signal processing: Use of Noise-Robust Sound Localization for the Separation of Voices

Johannes Nix, Volker Hohmann

This demonstration shows the application of an algorithm which was inspired by physiological findings in the barn owl. The algorithm is described in chapter 3 of the PhD thesis "Localization and separation of concurrent talkers based on principles of auditory scene analysis and multi-dimensional statistical methods". The statistical sound localization algorithm, on which it is based, is described extensively in chapter 2 of the same work, and has been published in Nix, J. and Hohmann, V., (2006), Sound source localization in real sound fields based on empirical statistics of interaural parameters, J. Acoust. Soc. Am. 119 (1).
In addition to the a priori statistic of interaural parameters in noise, the algorithm uses the values of the interaural transfer function (ITF) to separate the voices. The sound source localization algorithm has a convergence time of about 0.2 s; For the estimation of the filtering coefficients, a larger time constant is used to reduce artefacts in form of noise bursts.
Both parts of the algorithm are easily extensible to situations with more microphones and more talkers; for the separation of sources, not more sources than microphones should be present. For an efficient sound localization, it is beneficial that the pairwise inter-microphone transfer functions are somewhat asymmetrical in space.

Azimuth voice A	Azimuth voice B	SNR at ear canal	Mixed signal	Filtered channel A	Filtered channel B
-45	170	0	Mixture	Voice A	Voice B
-45	170	10	Mixture	Voice A	Voice B
-45	90	0	Mixture	Voice A	Voice B
-45	20	0	Mixture	Voice A	Voice B

(more publications about sound localization)

Johannes Nix

recommend this site