PEMO: Speech Recognition
with Perceptive Feature Extraction


A computational model of the auditory periphery (PEMO) was developed by the Medical Physics Group at Oldenburg University. PEMO was originally developed to simulate psychoacoustical experiments like temporal or spectral masking experiments. Recently, the model was applied to different topics in speech processing like speech intelligibility prediction, objective speech quality measurement and automatic speech recognition (ASR).
The motivation for our work in the field of ASR is that the human auditory system can be regarded as a very robust "speech regognition system" which allows us to understand speech in very noisy environments. Today's ASR systems, on the other hand, usually perform quite bad even in low noise. Simulating the "internal representation" of speech with an auditory-based feature extraction like PEMO should allow a more robust automatic recognition of speech.

Processing Stages of PEMO

The representation of speech and sounds after PEMO-processing:

Recognition experiments

were performed with PEMO feature extraction. The task was speaker-independent, isolated digit recognition in quiet and in noise. The speech material was corrupted with different types of additive and convolutive noise before feature extraction. Both HMM and neural networks were used for recognition. Other front ends like MFCC or RASTA were tested for comparison. The results show

Related Papers and Articles:

Tchorz, J., Kasper, K., Reininger, H. and Kollmeier, B.
On the Interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise
Eurospeech ´97 , p. 2075-2078, ESCA, Patras, Greece, 1997.
Download (postscript, 392k)
Tchorz, J., Wesselkamp, M. and Kollmeier, B.
Gehörgerechte Merkmalsextraktion zur robusten Spracherkennung in Störgeräuschen
Fortschritte der Akustik - DAGA 96, p. 532-533, DEGA, Oldenburg, 1996.
Download (postscript, 81k)
Dau, T., Püschel, D., and Kohlrausch, A.
A quantitative model of the ``effective'' signal processing in the auditory system: I. Model
J. Acoust. Soc. Am., vol. 99, p. 3633-3631, 1996
Kasper, K., Reininger, R., and Wolf, D.
Exploiting the Potential of Auditory Preprocessing for Robust Speech Recognition by Locally Recurrent Neural Networks
Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 2 , p. 1223-1227, 1997
A more detailed description of the auditory model in ASR system and the setup of the experiments Download (postscript, 83k)
See also the publication list of our group.

Currently working on ASR with PEMO preprocessing: Michael Kleinschmidt
Back to Medical Physics Group home page

Last modified: Jan. 28, 1998 tch@medi.physik.uni-oldenburg.de