Based on the language dependency result obtained in Section 5.3, when finishing the MOS tests we ended up with one set of samples for each language. A three-layer feedforward neural network for each set of samples (language) consisting of 4 inputs (the quality-affecting parameters) in the input layer, 5 hidden neurons, and an output unit to generate a single output (the quality measure) was used.
We trained the neural network with the first 80 samples of the databases
shown in Table 5.2 for each language. For
French language, we trained the NN by the first 50 samples. The
remaining samples were used to test the trained neural network. By
comparing the actual quality scores against the NN's predictions, we
show in
Figures 5.4, 5.5,
and 5.6 the predicted against the actual
MOS values for the Arabic, Spanish and French training databases
respectively. We show also scatter plots for these cases in
Figures 5.8(a), 5.9(a)
and 5.10(a). Similarly, we used the
trained NN to evaluate the quality score for the testing databases (the
last 16 samples in Table 5.2 for the Arabic
and the Spanish languages and the last 15 samples for the French
language). The samples are not within those in the training databases
(thus non-seen by the neural network). We plot in
Figures 5.7(a), 5.7(b),
5.7(c) the predicted against the actual MOS
values for the Arabic, Spanish and French testing databases
respectively. We show also scatter plots for these cases in
Figures 5.8(b), 5.9(b),
and 5.10(b). From these figures, we can see two important results. First, the NN has the ability to learn very accurately the evaluation of the MOS for a given set of input parameters (Section 5.2.1). This is clearly shown in the Figures using the training databases. Second, it is able to have a very precise estimation of the MOS for any new values of the input parameters. This is shown in the Figures using the testing databases.
It can be observed from these Figures that speech quality scores
generated by the NN model fits quite nicely with the nonlinear model
``built'' by the subjects participating in the MOS experiment. It can
be established from these results that learning algorithms give neural
networks the advantage of high adaptability, which allows them to self
optimize their performance when functioning under a dynamical
environment. The statistics are as shown in
Table 5.3. As it can be observed, the
results are very encouraging, the neural networks approach allowed us to
get a very good model of a nonlinear mapping that resembles the way
human subjects assess speech quality. The first experiment we did for
the validation of our methodology was with
the French langauge. We only used 7 subjects to do the subjective test. Therefore, the precision of the
obtained subjective data is not as well as those of Arabic and Spanish
cases. However, the NN learned quite well the problem as we can see from
the obtained results for this case.
Table 5.3:
Performance of the NN to learn the problem and react to non-seen examples.
Language
Training Database
Testing Database
Correlation coefficient
MSE
Correlation coefficient
MSE
Arabic
0.966
0.035
0.967
0.035
Spanish
0.969
0.035
0.961
0.045
French
0.965
0.048
0.957
0.055
Figure 5.4:
Actual vs. Predicted MOS values on the Arabic training database.
Figure 5.5:
Actual vs. Predicted MOS values on the Spanish training database.
Figure 5.6:
Actual vs. Predicted MOS values on the French training database.
Figure 5.7:
Actual vs. Predicted MOS values on the testing databases.