next up previous contents index
Next: Conclusions Up: Measuring Speech Quality in Previous: Assessment of Speech Quality   Contents   Index


Performance of other Speech Quality Measures

Now, to have an idea about the performance of the known objective speech quality measures (namely the ones that have been introduced in Section 3.1), we have collected some data and Figures from the literature. The reported results from the literature are presented in two parts: the first ones shows the performance of these metrics to evaluate speech quality when only encoding impairments are considered; the second one when they are used with both encoding and network impairments. Depending on the available data, the compared metrics are SNR, SNRseg, BSD, MBSD, EMBSD, PSQM, PSQM+, MNB(1,2), E-model and PAMS. For the case when the existing metrics are used to evaluate only encoding impairments, from Table 5.4, as we see the simple metrics like SNR and SNRseg give poor correlation with subjective quality tests (ranging from 0.226 to 0.523 for SNR and from 0.221 to .521 for SNRseg). BSD, MBSD and EMBSD give better results than SNR or SNRseg. Regarding, PSQM and its enhanced version PSQM+, the performance is better and it is comparable to that of PAMS and MNB. The correlation coefficient can reach up to 0.98 for certain metrics. The variation of the correlation coefficient is due to the fact that some of these metrics are evaluated for different levels of distortions and others are only available for one kind of experiments. On the other hand, for the other case when these metrics are used to evaluate the quality taking into account both network and encoding impairments, the performance degrades too much. The only available data comparing the majority of the existing metrics is found in [155, p. 106]. However, due to the project agreement, they did not specify the name of the compared metrics except the MBSD and the EMBSD. We can see that EMBSD gives correlation coefficient of 0.54 in this case against 0.87 for the encoding only case. The performance of MBSD is very bad, about 0.24 against 0.760 for the encoding only case. The best metric has correlation coefficient of 0.90 with MOS results.

Table 5.4: Correlation coefficient of some existing objective speech quality measures with MOS. Results are taken from the literature. Only the encoding impairments are considered. Sources are [155, p. 103],  [139, p. 84], [113, p. 1517] and [112, p. 10/7].
Objective Measures Correlation with Subjective Measure (MOS)
SNR 0.226-0.523
SNRseg 0.221-521
BSD 0.367-0.919
PSQM 0.830-0.980
PSQM+ 0.874-0.980
MNB2 0.740-0.98
PAMS 0.640-0.895
MBSD 0.760
EMBSD 0.870


Table 5.5: Correlation coefficient of some existing objective speech quality measures with MOS for both network and encoding impairments. Results are taken from the literature. Source is [155, P. 106]. The letters from A to F represents the objective quality measures introduced in Sec. 3.1. The name of each measure has not been released because of the project agreement.

Objective Measures Correlation with Subjective Measure (MOS)
A 0.87
B 0.85
C 0.56
D 0.86
E 0.90
F 0.86
MBSD 0.24
EMBSD 0.54

From the available objective speech quality measures in the literature, only the ITU E-model does not need the access to the original signal to compute the quality. Thus, it is the only available measure which is computationally simple [57] and can be used in real-time applications. However, from the results reported in [57], the correlation coefficient, when using it for VoIP quality evaluation, is bad, about 0.70. In addition, the same work presented in [57] compared the performance of EMBSD and MNB (1 and 2) to evaluate the quality of speech signal distorted by network impairments (VoIP), the reported results show that the correlation coefficient of these measures as well as the E-model is bad with the subjective MOS as shown from Table 5.6 (cf. Table 5.3). We show from that work two scatter plots of the MNB2 and the E-model quality evaluations with respect to MOS in Figure 5.11(a) and Figure 5.11(b) (cf. Figures 5.8 and 5.9). The expected results, for the perfect speech quality measure is that there should be a one-to-one relation between the MOS value and the measure's output. In another words, all the results should be drawn on the same line, with the minimum output of the measure corresponding to the MOS value of 1 and that of the maximum corresponding to the MOS value of 5. However, from the two Figures shown, we can see many variations. For the same value of MOS, there are many values of both measures. For example, for MOS=2.5, the MNB2 output varies from 0.1 to 0.7 (note that output range is from 0 to 1); and that of E-model varies from 20 to 80 (the range of output is 0-100). Similarly, for the same measures' output, there are many MOS values that can satisfy it. For example, for the MNB2's output of 0.4, the corresponding MOS value varies from 1.2 to 4.5; and for E-model's output of 80, the MOS value varies from 1.1 to 4.5 (the MOS scale is from 1 to 5). As we can see this reduces significantly the confidence of these measures to measure the quality of speech signals when distorted by variation of network parameters. It should be noted that the E-model is designed and enhanced to take into account the effect of network distortion. In addition, these data is somewhat recent, from 2001. We can see from the results in both Tables 5.5 and 5.6, that there is no agreement in the common measures. This is because the measures in Table 5.5 did not tested by all the network parameters as those shown in Table 5.6.

Table 5.6: Correlation coefficient of EMBSD, MNB and E-model with MOS for VoIP impairments. Source is [57].
Objective Measures Correlation with Subjective Measure (MOS)
EMBSD 0.38
MNB1 0.51
MNB2 0.54
E-model 0.70

Figure: MNB2 and E-model results against the MOS subjective values in evaluating a set of speech samples distorted by both encoding and network impairments. Source is [57].
[Performance of the MNB2] \fbox{\includegraphics[width=.45\textwidth,height=6cm]{GeneralFigs/VoIP-Fig1.eps}} [Performance of the E-model] \fbox{\includegraphics[width=.45\textwidth,height=6cm]{GeneralFigs/VoIP-Fig2.eps}}


next up previous contents index
Next: Conclusions Up: Measuring Speech Quality in Previous: Assessment of Speech Quality   Contents   Index
Samir Mohamed 2003-01-08