PEMO: Speech Recognition with Perceptive Feature Extraction

A computational model of the auditory periphery (PEMO) was developed by the Medical Physics Group at Oldenburg University. PEMO was originally developed to simulate psychoacoustical experiments like temporal or spectral masking experiments. Recently, the model was applied to different topics in speech processing like speech intelligibility prediction, objective speech quality measurement and automatic speech recognition (ASR).
The motivation for our work in the field of ASR is that the human auditory system can be regarded as a very robust "speech regognition system" which allows us to understand speech in very noisy environments. Today's ASR systems, on the other hand, usually perform quite bad even in low noise. Simulating the "internal representation" of speech with an auditory-based feature extraction like PEMO should allow a more robust automatic recognition of speech. 

Processing Stages of PEMO

  • Preemphasis of the time signal
  • Basilar-membrane filtering with a gammatone filterbank
  • Envelope Extraction (half-wave rectification and low pass filtering)
  • Adaptive amplitude compression to simulate short-term adaptation
  • Low pass filtering of the compressed envelope
The representation of speech and sounds after PEMO-processing:
  • Stationary input signals are log-compressed, approximately
  • Changes in the input signal, like onsets and offsets are transformed linearly, thus emphasized
  • Amplitude modulations between about 1 and 10 Hz are passed, others suppressed
  • The coding of the input signal is sparse and distinct
  • See the visual demonstration


Recognition experiments

were performed with PEMO feature extraction in a range of different setups. The robustness of the auditory-based preprocessing was compared with other front ends, with both HMM and neural network recognizers. The effect of additional monaural and binaural noise suppression prior to feature extraction was investigated. Current research focusses on sub word unit recognition in noise. Have a look at the below mentionend papers, if you're interested, or contact Michael Kleinschmidt, Christine Hartmann, or Jürgen Tchorz.


