Date of Award
Master of Engineering (Research)
Dr. Joe Connell
Most speech coding algorithms in use today utilise the source-filter model of human speech production in which speech is modelled as the response of a time varying, linear, all-pole, synthesis filter to an input excitation. In mobile communications, it is the coefficients of this filter and details of the excitation that are transmitted. The filter coefficients are first quantised using a limited number of bits. Due to coefficient sensitivity, direct quantisation of the filter coefficients may lead to an unstable filter during speech synthesis at the receiver and hence an alternative representation is needed. Line Spectral Frequencies (LSFs) have evolved as the most effective representation of synthesis filter coefficients in order to protect against degradation.
The quantisation process involves first generating the LSFs mathematically and then using weighting functions to prioritise them based on their spectral sensitivity. This thesis considers four weighting functions (WF) for use in the quantisation of LSF values and evaluates them in terms of performance both in split vector and split matrix LSF quantisation. A spectral distortion measure using the quantised and unquantised LSFs is generated to determine the best WF for each quantiser. The third generation Adaptive Multi Rate (3G AMR) codec is used as the testbed.
Since LSFs exhibit correlation from frame to frame, the work progresses to examine prediction as a means of reducing the number of quantisation bits. The thesis examines the use of higher order, fixed factor predictors and adaptive predictors to make better estimates of current LSF values. Higher order predictors further exploit correlation between corresponding LSFs in successive frames while adaptive predictors eliminate the need for memory storage of fixed LSF estimates. In addition, an adaptive predictor is less speaker dependent. The LMS algorithm is used in the predictor weight adaptation. Prediction effectiveness is measured using the 3G ANTR and G.729 codecs.
This work also examines the possibility of reducing the LSF residual dynamic range while maintaining speech quality by low pass filtering the LSF values. The objective is to decrease the residual LSF value and hence the codebook size. The results show that filtering one LSF subvector alone can halve the dynamic range while maintaining existing speech quality.
McCarthy, John Raymond, "Enhancement of Linear Prediction Coefficient Quantisation in the 3G Adaptive Multi Rate Speech Coder" (2004). Theses [online].
Available at: https://sword.cit.ie/allthe/677