

|
  |
Medi
Demo
PEMO:
Speech Recognition with Perceptive Feature Extraction |
A computational model of the auditory periphery (PEMO) was developed
by the Medical Physics
Group at Oldenburg
University. PEMO was originally developed to simulate psychoacoustical
experiments like temporal or spectral masking experiments. Recently,
the model was applied to different topics in speech processing like
speech intelligibility prediction, objective speech quality measurement
and automatic speech recognition (ASR).
The motivation for our work in the field of ASR is that the human
auditory system can be regarded as a very robust "speech regognition
system" which allows us to understand speech in very noisy environments.
Today's ASR systems, on the other hand, usually perform quite bad
even in low noise. Simulating the "internal representation" of speech
with an auditory-based feature extraction like PEMO should allow a
more robust automatic recognition of speech.
Processing Stages of PEMO
- Preemphasis of the time signal
- Basilar-membrane filtering with a gammatone
filterbank
- Envelope Extraction (half-wave rectification
and low pass filtering)
- Adaptive amplitude compression to simulate
short-term adaptation
- Low pass filtering of the compressed envelope
The representation of speech and sounds after
PEMO-processing:
- Stationary input signals are log-compressed,
approximately
- Changes in the input signal, like onsets
and offsets are transformed linearly, thus emphasized
- Amplitude modulations between about 1 and
10 Hz are passed, others suppressed
- The coding of the input signal is sparse
and distinct
- See the visual
demonstration
Recognition experiments
were performed with PEMO feature extraction in a range of different
setups. The robustness of the auditory-based preprocessing was compared
with other front ends, with both HMM and neural network recognizers.
The effect of additional monaural and binaural noise suppression
prior to feature extraction was investigated. Current research focusses
on sub word unit recognition in noise. Have a look at the below
mentionend papers, if you're interested, or contact Michael
Kleinschmidt, Christine
Hartmann, or Jürgen
Tchorz.
Related Papers and Articles:
M. Kleinschmidt, J. Tchorz and B. Kollmeier:
'Combining Speech Enhancement and Auditory Feature Extraction for
Robust Speech Recognition',
"Speech Communication - Special Issue on Robust ASR" (accepted, to
be published april 2001)
Tchorz, J., Kleinschmidt, M., and Kollmeier, B.:
'Noise suppression based on neurophysiologically motivated SNR estimation
for robust speech recognition',
Proceedings of NIPS 2000, in press.
Download
as zipped ps(110 kbyte)
M. Kleinschmidt und V. Hohmann:
'Perzeptive Vorverarbeitung und automatische Selektion sekundärer
Merkmale zur robusten Spracherkennung',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 382-383,
DEGA, Oldenburg.
HTML,
Download
as gzipped ps(83 kbyte)
J. Anemüller, M. Kleinschmidt und B. Kollmeier:
'Blinde Quellentrennung als Vorverarbeitung zur robusten Spracherkennung',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 364-365,
DEGA, Oldenburg.
Download
as gzipped ps(102 kbyte)
C. Hartmann, M. Kleinschmidt, J. Tchorz und B. Kollmeier:
'Gehörgerechte Vorverarbeitung für die robuste Spracherkennung
auf Basis von Wortuntereinheiten',
"Fortschritte der Akustik - DAGA 2000", Oldenburg, pp. 380-381,
DEGA, Oldenburg.
M. Kleinschmidt, M. Marzinzik und B. Kollmeier:
Combining Monaural Noise Reduction Algorithms and Perceptive Preprocessing
for Robust Speech Recognition.
in: "Psychophysics, Physiology, and Models of Hearing", edited by
T. Dau, V. Hohmann, and B. Kollmeier. World Scientific, Singapore
(1999).
Download
(pdf, 180 kbyte)
J. Tchorz and B. Kollmeier:
A model of auditory perception as front end for automatic speech
recognition.
J. Acoust. Soc. Am. (JASA) 106(4):2040-2050, 1999.
J. Tchorz and B. Kollmeier:
A psychoacoustical model of the auditory periphery as front end
for ASR.
Proc. ASA/EAA/DEGA Joint Meeting on Acoustics, March 1999, Berlin,
Germany (in press)
Download
(pdf, 60 kbyte)
J. Tchorz, M. Kleinschmidt, K. Kasper and B. Kollmeier:
Auditory Feature Extraction and Recognizer Dependencies.
Paper presented at "Workshop on Robust Methods for Speech Recognition",
May 25-26, 1999, Tampere, Finland, pp. 67-70
Download
(pdf, 190 kbyte)
M. Kleinschmidt:
Störgeräuschunterdrueckung und gehörgerechte Vorverarbeitung
für die automatische Spracherkennung.
Master's Thesis (Diplomarbeit), 1998.
M. Kleinschmidt, J. Tchorz, T. Wittkop, V. Hohmann, and B. Kollmeier:
Robuste
Spracherkennung durch binaurale Richtungsfilterung und gehörgerechte
Vorverarbeitung
"Fortschritte der Akustik - DAGA 1998, Zürich"
Download
(gzipped postscript, 70k)
J. Tchorz, K. Kasper, H. Reininger, and B. Kollmeier:
On
the Interplay between auditory-based features and locally recurrent
neural networks for robust speech recognition in noise
Eurospeech ´97 , p. 2075-2078, ESCA, Patras, Greece, 1997.
Download
(postscript, 392k)
J. Tchorz, M. Wesselkamp, and B. Kollmeier:
Gehörgerechte
Merkmalsextraktion zur robusten Spracherkennung in Störgeräuschen
Fortschritte der Akustik - DAGA 96, p. 532-533, DEGA, Oldenburg,
1996.
Download
(postscript, 81k)
T. Dau, D. Püschel, and A. Kohlrausch:
A quantitative model of the ``effective'' signal processing in
the auditory system: I. Model
J. Acoust. Soc. Am., vol. 99, p. 3633-3631, 1996
K. Kasper, H. Reininger, and D. Wolf:
Exploiting the Potential of Auditory Preprocessing for Robust
Speech Recognition by Locally Recurrent Neural Networks
Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP),
vol. 2 , p. 1223-1227, 1997
See also the publication
list of our group.
|
 |