PEMO: Speech Recognition
with Perceptive Feature Extraction

A computational model of the auditory periphery (PEMO) was developed by the Medical Physics Group at Oldenburg University. PEMO was originally developed to simulate psychoacoustical experiments like temporal or spectral masking experiments. Recently, the model was applied to different topics in speech processing like speech intelligibility prediction, objective speech quality measurement and automatic speech recognition (ASR).
The motivation for our work in the field of ASR is that the human auditory system can be regarded as a very robust "speech regognition system" which allows us to understand speech in very noisy environments. Today's ASR systems, on the other hand, usually perform quite bad even in low noise. Simulating the "internal representation" of speech with an auditory-based feature extraction like PEMO should allow a more robust automatic recognition of speech.

Processing Stages of PEMO

Preemphasis of the time signal
Basilar-membrane filtering with a gammatone filterbank
Envelope Extraction (half-wave rectification and low pass filtering)
Adaptive amplitude compression to simulate short-term adaptation
Low pass filtering of the compressed envelope

The representation of speech and sounds after PEMO-processing:

Stationary input signals are log-compressed, approximately
Changes in the input signal, like onsets and offsets are transformed linearly, thus emphasized
Amplitude modulations between about 1 and 10 Hz are passed, others suppressed
The coding of the input signal is sparse and distinct
See the visual demonstration

Recognition experiments

were performed with PEMO feature extraction in a range of different setups. The robustness of the auditory-based preprocessing was compared with other front ends, with both HMM and neural network recognizers. The effect of additional monaural and binaural noise suppression prior to feature extraction was investigated. Current research focusses on sub word unit recognition in noise. Have a look at the below mentionend papers, if you're interested, or contact Michael Kleinschmidt, Christine Hartmann, or Jürgen Tchorz.

Related Papers and Articles:

M. Kleinschmidt, J. Tchorz and B. Kollmeier:
'Combining Speech Enhancement and Auditory Feature Extraction for Robust Speech Recognition',
"Speech Communication - Special Issue on Robust ASR" (accepted, to be published april 2001)

Tchorz, J., Kleinschmidt, M., and Kollmeier, B.:
'Noise suppression based on neurophysiologically motivated SNR estimation for robust speech recognition',
Proceedings of NIPS 2000, in press.
Download as zipped ps(110 kbyte)

M. Kleinschmidt und V. Hohmann:
'Perzeptive Vorverarbeitung und automatische Selektion sekundärer Merkmale zur robusten Spracherkennung',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 382-383, DEGA, Oldenburg.
HTML, Download as gzipped ps(83 kbyte)

J. Anemüller, M. Kleinschmidt und B. Kollmeier:
'Blinde Quellentrennung als Vorverarbeitung zur robusten Spracherkennung',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 364-365, DEGA, Oldenburg.
Download as gzipped ps(102 kbyte)

C. Hartmann, M. Kleinschmidt, J. Tchorz und B. Kollmeier:
'Gehörgerechte Vorverarbeitung für die robuste Spracherkennung auf Basis von Wortuntereinheiten',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 380-381, DEGA, Oldenburg.

M. Kleinschmidt, M. Marzinzik und B. Kollmeier:
Combining Monaural Noise Reduction Algorithms and Perceptive Preprocessing for Robust Speech Recognition.
in: "Psychophysics, Physiology, and Models of Hearing", edited by T. Dau, V. Hohmann, and B. Kollmeier. World Scientific, Singapore (1999).
Download (pdf, 180 kbyte)

J. Tchorz and B. Kollmeier:
A model of auditory perception as front end for automatic speech recognition.
J. Acoust. Soc. Am. (JASA) 106(4):2040-2050, 1999.

J. Tchorz and B. Kollmeier:
A psychoacoustical model of the auditory periphery as front end for ASR.
Proc. ASA/EAA/DEGA Joint Meeting on Acoustics, March 1999, Berlin, Germany (in press)
Download (pdf, 60 kbyte)

J. Tchorz, M. Kleinschmidt, K. Kasper and B. Kollmeier:
Auditory Feature Extraction and Recognizer Dependencies.
Paper presented at "Workshop on Robust Methods for Speech Recognition", May 25-26, 1999, Tampere, Finland, pp. 67-70
Download (pdf, 190 kbyte)

M. Kleinschmidt:
Störgeräuschunterdrueckung und gehörgerechte Vorverarbeitung für die automatische Spracherkennung.
Master's Thesis (Diplomarbeit), 1998.

M. Kleinschmidt, J. Tchorz, T. Wittkop, V. Hohmann, and B. Kollmeier:
Robuste Spracherkennung durch binaurale Richtungsfilterung und gehörgerechte Vorverarbeitung
"Fortschritte der Akustik - DAGA 1998, Zürich"
Download (gzipped postscript, 70k)

J. Tchorz, K. Kasper, H. Reininger, and B. Kollmeier:
On the Interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise
Eurospeech ´97 , p. 2075-2078, ESCA, Patras, Greece, 1997.
Download (postscript, 392k)

J. Tchorz, M. Wesselkamp, and B. Kollmeier:
Gehörgerechte Merkmalsextraktion zur robusten Spracherkennung in Störgeräuschen
Fortschritte der Akustik - DAGA 96, p. 532-533, DEGA, Oldenburg, 1996.
Download (postscript, 81k)

T. Dau, D. Püschel, and A. Kohlrausch:
A quantitative model of the ``effective'' signal processing in the auditory system: I. Model
J. Acoust. Soc. Am., vol. 99, p. 3633-3631, 1996

K. Kasper, H. Reininger, and D. Wolf:
Exploiting the Potential of Auditory Preprocessing for Robust Speech Recognition by Locally Recurrent Neural Networks
Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 2 , p. 1223-1227, 1997

See also the publication list of our group.

Back to Medical Physics Group home page

Last modified: March 11, 2001 michael@medi.physik.uni-oldenburg.de

PEMO: Speech Recognition with Perceptive Feature Extraction

The representation of speech and sounds after PEMO-processing:

Recognition experiments

Related Papers and Articles:

PEMO: Speech Recognition
with Perceptive Feature Extraction