Next: Performance of other Speech Up: Measuring Speech Quality in Previous: A Mean Opinion Score Contents Index

Assessment of Speech Quality by Neural Networks

Based on the language dependency result obtained in Section 5.3, when finishing the MOS tests we ended up with one set of samples for each language. A three-layer feedforward neural network for each set of samples (language) consisting of 4 inputs (the quality-affecting parameters) in the input layer, 5 hidden neurons, and an output unit to generate a single output (the quality measure) was used. We trained the neural network with the first 80 samples of the databases shown in Table 5.2 for each language. For French language, we trained the NN by the first 50 samples. The remaining samples were used to test the trained neural network. By comparing the actual quality scores against the NN's predictions, we show in Figures 5.4, 5.5, and 5.6 the predicted against the actual MOS values for the Arabic, Spanish and French training databases respectively. We show also scatter plots for these cases in Figures 5.8(a), 5.9(a) and 5.10(a). Similarly, we used the trained NN to evaluate the quality score for the testing databases (the last 16 samples in Table 5.2 for the Arabic and the Spanish languages and the last 15 samples for the French language). The samples are not within those in the training databases (thus non-seen by the neural network). We plot in Figures 5.7(a), 5.7(b), 5.7(c) the predicted against the actual MOS values for the Arabic, Spanish and French testing databases respectively. We show also scatter plots for these cases in Figures 5.8(b), 5.9(b), and 5.10(b). From these figures, we can see two important results. First, the NN has the ability to learn very accurately the evaluation of the MOS for a given set of input parameters (Section 5.2.1). This is clearly shown in the Figures using the training databases. Second, it is able to have a very precise estimation of the MOS for any new values of the input parameters. This is shown in the Figures using the testing databases. It can be observed from these Figures that speech quality scores generated by the NN model fits quite nicely with the nonlinear model ``built'' by the subjects participating in the MOS experiment. It can be established from these results that learning algorithms give neural networks the advantage of high adaptability, which allows them to self optimize their performance when functioning under a dynamical environment. The statistics are as shown in Table 5.3. As it can be observed, the results are very encouraging, the neural networks approach allowed us to get a very good model of a nonlinear mapping that resembles the way human subjects assess speech quality. The first experiment we did for the validation of our methodology was with the French langauge. We only used 7 subjects to do the subjective test. Therefore, the precision of the obtained subjective data is not as well as those of Arabic and Spanish cases. However, the NN learned quite well the problem as we can see from the obtained results for this case.

**Table 5.3:** Performance of the NN to learn the problem and react to non-seen examples.
Language	Training Database		Testing Database
	Correlation coefficient	MSE	Correlation coefficient	MSE
Arabic	0.966	0.035	0.967	0.035
Spanish	0.969	0.035	0.961	0.045
French	0.965	0.048	0.957	0.055

**Figure 5.4:** Actual vs. Predicted MOS values on the Arabic training database.
$\fbox{\includegraphics[width=.85\textwidth,height=9cm]{Speech/Arabic-Training.eps}}$

**Figure 5.5:** Actual vs. Predicted MOS values on the Spanish training database.
$\fbox{\includegraphics[width=.85\textwidth,height=9cm]{Speech/Spanish-Training.eps}}$

**Figure 5.6:** Actual vs. Predicted MOS values on the French training database.
$\fbox{\includegraphics[width=.85\textwidth,height=9cm]{Speech/French-Training.eps}}$

**Figure 5.7:** Actual vs. Predicted MOS values on the testing databases.
[Arabic Language] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Arabic-Testing.eps}}$ [Spanish Language] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Spanish-Testing.eps}}$ [French Language] $\fbox{\includegraphics[width=.45\textwidth]{Speech/French-Testing.eps}}$

**Figure 5.8:** Scatter plots to show the correlation between Actual and Predicted MOS values (Arabic Language).
[Training DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Arabic-Training-Scatter.eps}}$ [Testing DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Arabic-Testing-Scatter.eps}}$

**Figure 5.9:** Scatter plots to show the correlation between Actual and Predicted MOS values (Spanish Language).
[Training DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Spanish-Training-Scatter.eps}}$ [Testing DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/Spanish-Testing-Scatter.eps}}$

**Figure 5.10:** Scatter plots to show the correlation between Actual and Predicted MOS values (French Language).
[Training DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/French-Training-Scatter.eps}}$ [Testing DB] $\fbox{\includegraphics[width=.45\textwidth]{Speech/French-Testing-Scatter.eps}}$

Next: Performance of other Speech Up: Measuring Speech Quality in Previous: A Mean Opinion Score Contents Index

Samir Mohamed 2003-01-08