PEMO: Speech Recognition
with Perceptive Feature Extraction

A computational model of the auditory periphery (PEMO) was developed by the Medical Physics Group at Oldenburg University. PEMO was originally developed to simulate psychoacoustical experiments like temporal or spectral masking experiments. Recently, the model was applied to different topics in speech processing like speech intelligibility prediction, objective speech quality measurement and automatic speech recognition (ASR).
The motivation for our work in the field of ASR is that the human auditory system can be regarded as a very robust "speech regognition system" which allows us to understand speech in very noisy environments. Today's ASR systems, on the other hand, usually perform quite bad even in low noise. Simulating the "internal representation" of speech with an auditory-based feature extraction like PEMO should allow a more robust automatic recognition of speech.

Processing Stages of PEMO

The representation of speech and sounds after PEMO-processing:

Recognition experiments

were performed with PEMO feature extraction. The task was speaker-independent, isolated digit recognition in quiet and in noise. The speech material was corrupted with different types of additive and convolutive noise before feature extraction. Both HMM and neural networks were used for recognition. Other front ends like MFCC or RASTA were tested for comparison. The results show

See also the publication list of our group.

Currently working on ASR with PEMO preprocessing: Michael Kleinschmidt
