Prediction of Speech Quality based on Psychoacoustical Preprocessing Models

by Martin Hansen and Birger Kollmeier,
AG Medizinische Physik, Uni Oldenburg, 26111 Oldenburg

Presented on the ITG/EURASIP "Workshop on Quality Assessment in Speech, Audio and Image Communication". Darmstadt, March 1996.

Abstract

This study investigates the implementation of five different psychoacoustical preprocessing models for measuring the speech quality of low-bit-rate codecs.

The principal method used for measuring the speech quality is the same for each of these five preprocessing models: The preprocessing models are applied to transform the input and output signal of a speech coding device to a so-called ``internal representation'' of the sound. Differences in this internal representation of input and output signal are expected to correspond to a decreased speech quality of the output signal.

At present, the most successful objective speech quality prediction for the ETSI Halfrate Selection test was obtained by using a psychoacoustic preprocessing model which also enables to simulate psychoacoustical threshold data in various conditions.

Introduction

In the development of objective speech quality prediction psychoacoustically motivated preprocessing models have gained an increasing importance. The reason is that the ``conventional'' Signal-to-Noise ratio measures and their derivates clearly fail to describe the transmission quality of nonlinear time variant systems like low-bit-rate speech codecs in a satisfactory way.

The goal in objective speech quality measurement is to quantify the quality degradation of a speech sample relatively to an undegraded reference situation. The application of psychoacoustical preprocessing models is motivated by the assumption that the signal is transformed to an ``internal representation'' of the sound that is reached after auditory preprocessing. This representation is accessible to higher neuronal stages of perception. It should contain the perceptually relevant features of the incoming sound. Differences in this ``internal representation'' of input and output signal are expected to correspond to perceivable differences of the two signals and thus indicate a decreased speech quality of the output signal. Many alternatives have been proposed to incorporate elements of human perception. In most psychoacoustically motivated speech quality measures adjustable parameters can be used to maximize the correlation between the objective and subjective speech quality measure. However, this may result in difficulties in handling every arbitrary kind of signal degradation or in a restricted applicability of the preprocessing model for psychoacoustical purposes.

The aim of this study is to investigate the necessary properties of a ``functional auditory preprocessing model'' capable of measuring the speech quality of arbitrary speech coding systems. Therefore five psychoacoustical preprocessing models of different complexity have been applied to the measurement of the speech quality of low-bit-rate coded speech sounds. Here the name ``preprocessing model'' refers to an algorithm that transforms the input of the human auditory system to an internal representation, while the ``method'' used for calculating the objective speech quality measure is the same with each of the five preprocessing models. The method assumes that subjects are able to compare the quality of a test speech signal with that of an internally stored reference. From the frequency weighted internal representations of the original and the distorted signal a physical measure of similarity is calculated as the objective speech quality measure.

Back to speech quality home page

martin@medi.physik.uni-oldenburg.de

Last modified: Apr 18 1997