next up previous contents index
Next: Results Up: Measuring Video Quality Previous: The Quality-Affecting Parameters   Contents   Index


Subjective Quality Test and MOS Experiment

We used the method named ``Degradation Category Rating (DCR)'' test. For details about subjective quality tests and other test procedures, see Section 4.3. We summarize here the used method. A pair of video sequences is presented to each observer, one after the other: subjects should see the first one, which is not distorted by any impairment, and then the second one, which is the original signal distorted by some configuration of the set of chosen quality-affecting parameters. Figure 4.6 (in page [*]) shows the sequence and timing of presentations for this test. The time values come from the recommendation of the ITU-R [67]. As the observer is faced by two sequences, he/she is asked to assess the overall quality of the distorted sequence with respect to the non-distorted one (reference sequence). Figure 4.2 (in page [*]) depicts the ITU-R nine-grade scale. The observers should give the test presentation a grade from one to nine corresponding to their mental measure of the quality associated with it. It should be noted that there exist several quality scales (see Section 4.3.5.1, in page [*]). We chose this nine-grade one as a tradeoff between precision and dispersion of the subjective evaluations. Following the ITU-R recommendations, overall subjective tests should be divided into multiple sessions and each session should not last more than 30 minutes. For every session, we should add several dummy sequences (about four or five). These sequences should not be taken into account in the calculation. Their aim is to be used as training samples for the observers to learn how to give meaningful rates. We divided the test into two sessions, and added 5 distorted sequences to the first session and 4 to the second session. These nine sequences were not considered in the MOS calculation as their aim is to be used as a training phase for the human subjects. At the same time, they were used to verify how reliable is the person carrying out the test, as they were replicated from the real 94 samples. Thus the total number of video sequences to be evaluated is 103 samples. We invited 20 people to perform the subjective tests. Each subject was given a scoring sheet (Table 6.1 depicts a portion of it) and the quality scale. The test procedure was explained to each one. A computer program was implemented to control the evaluation process, keeping in mind the timing aspect of the DCR test as given in Figure 4.6. Hence, the subject typed the number of sample, then the original sample was played back on the screen, with the whole background in gray (see Figure 6.1), as specified in the ITU recommendation. After that a complete gray period of 3 seconds was shown, then the distorted version was played back during 10 seconds. After that a phrase asking the subject to give the mental quality evaluation of the distorted sequence with respect to the original was displayed. The subject had the ability to review any sequence to decide on the score he had to give if he/she had some doubt.

Figure 6.1: A screen dump showing an instance during the subjective quality test evaluation.
\fbox{\includegraphics[width=14cm]{Screenshots/MOS-Play.eps}}


Table 6.1: A portion of the scoring sheet distributed to each subject.
Session number 1   Session number 2
Sample Number Sample Score   Sample Number Sample Score
1     53  
2     54  
3     55  
         
50     102  
51     103  
52        
         

After that, a prescreening of the results was performed (see Section 4.4 for details); as a consequence, we discarded the notes of two subjects. The 95% confidence intervals after and before removing these two subjects are shown in Figure 6.2. As it is clear from the Figure, the removal of the rates of these two subjects significantly increased the precision of the overall MOS scores.
Table 6.2: The quality database to train and test the NN (before being normalized)
Bit Rate Frame Rate Consecutive Loss Loss Rate Ratio MOS Value
(BR) (FR) Packets (CLP) (LR) (RA) Actual Predicted
768 6 1 0.00 0.12 5.60 5.73
768 15 1 4.30 0.12 4.60 4.27
768 15 4 8.10 0.30 2.70 2.68
768 30 1 0.00 0.14 7.30 7.30
1024 30 1 0.00 0.44 8.60 8.25
256 6 1 0.00 0.14 2.80 3.34
768 15 1 7.50 0.09 4.00 3.53
768 6 1 2.30 0.24 3.60 4.19
768 30 1 0.00 0.40 7.80 7.34
768 15 1 8.60 0.30 2.40 2.27
768 30 1 1.10 0.40 6.30 7.00
768 30 1 1.80 0.40 7.30 6.75
768 30 1 4.10 0.40 5.30 5.06
512 15 1 0.00 0.09 5.40 4.86
256 15 1 0.00 0.18 4.00 3.72
768 15 5 7.60 0.30 3.20 2.97
1024 15 1 3.80 0.36 3.90 3.68
1024 15 1 0.00 0.11 7.60 7.59
768 30 1 8.30 0.40 1.40 2.05
1024 15 1 0.00 0.36 7.40 7.21
1024 15 1 1.20 0.36 6.10 6.02
1024 15 1 2.00 0.36 4.80 5.08
768 30 1 0.00 0.06 6.80 6.81
256 15 1 7.60 0.18 1.30 0.93
768 15 3 1.60 0.30 5.10 5.12
512 10 1 0.00 0.23 5.20 5.11
1024 15 1 8.10 0.36 1.90 2.21
768 15 1 8.60 0.16 2.70 3.09
1024 15 1 0.00 0.14 7.90 7.95
768 15 1 1.20 0.09 5.00 5.24
512 30 1 0.00 0.33 6.30 5.97
768 15 1 2.10 0.09 4.30 4.87
256 10 1 0.00 0.17 3.70 3.35
768 15 1 0.00 0.16 6.90 6.55
768 10 1 0.00 0.19 6.00 6.07
1024 10 1 0.00 0.32 6.30 6.05
768 10 1 2.10 0.29 4.10 4.71
768 15 1 0.00 0.30 7.30 7.74
768 15 2 1.10 0.30 5.80 5.97
768 15 5 0.90 0.30 6.00 6.10
768 15 1 1.20 0.30 5.70 6.41
768 15 1 1.00 0.12 5.20 5.42
256 15 1 0.00 0.10 3.00 3.29
768 15 4 1.30 0.30 5.80 5.72
768 15 1 2.00 0.16 5.10 4.93
256 15 1 1.90 0.18 2.40 2.07
768 10 1 0.00 0.14 6.10 5.93
768 15 3 1.60 0.30 5.00 5.12
1024 15 1 0.00 0.20 8.10 8.24
768 15 1 1.20 0.16 5.40 5.40
256 15 1 0.00 0.07 2.40 3.18
768 15 3 4.00 0.30 3.60 3.33
512 15 1 0.00 0.07 5.30 4.74
768 15 2 1.90 0.30 5.60 5.05
768 15 1 2.00 0.30 5.40 5.56
256 15 1 1.90 0.18 2.30 2.07
768 15 5 2.50 0.30 4.80 5.28
768 15 2 3.70 0.30 3.00 3.63
768 15 5 3.90 0.30 4.30 4.06
768 15 1 2.00 0.12 4.80 4.91
768 15 1 0.00 0.09 6.20 6.14
768 15 1 4.20 0.30 4.80 4.25
768 10 1 4.50 0.29 3.80 3.63
768 30 1 0.00 0.09 6.70 6.83
768 15 4 4.30 0.30 4.00 3.50
768 15 1 7.50 0.12 3.00 3.48
512 15 1 3.90 0.24 2.20 2.65
768 15 5 7.60 0.30 3.00 2.97
768 15 2 7.90 0.30 2.70 2.33
768 15 1 4.40 0.09 4.40 4.28
768 10 1 7.50 0.29 1.70 1.63
768 10 1 0.00 0.11 6.00 5.85
768 15 3 8.10 0.30 2.10 2.58
768 10 1 0.00 0.29 6.90 6.64
768 10 1 1.30 0.29 5.00 5.26
768 6 1 7.80 0.24 2.30 2.15
768 15 1 0.00 0.12 6.30 6.30
768 15 1 4.40 0.16 4.70 4.18
1024 6 1 0.00 0.25 5.70 5.68
256 15 1 0.00 0.06 2.80 3.12
512 15 1 0.00 0.24 5.80 5.61
256 30 1 0.00 0.19 4.50 4.14
768 15 4 1.80 0.30 5.60 5.39
512 15 1 1.30 0.24 4.10 4.14
768 6 1 0.00 0.15 5.80 5.72
512 15 1 2.40 0.24 3.40 3.28
512 6 1 0.00 0.21 4.80 4.80
512 15 1 7.90 0.24 1.50 1.61
768 6 1 1.20 0.24 4.30 4.75
768 6 1 3.70 0.24 3.20 3.71
768 6 1 0.00 0.24 6.00 6.11
768 6 1 0.00 0.19 5.50 5.66
512 15 1 0.00 0.13 5.60 5.06
256 15 1 4.60 0.18 1.50 1.34

Figure 6.2: The 95% confidence intervals before and after removing the rate of two unreliable subjects. We can see how the 95% confidence interval decrease after the removal of the rates of these two persons.
\fbox{\includegraphics[angle=-90, width=12cm]{Figs_1A/Confidences.epsi}}


next up previous contents index
Next: Results Up: Measuring Video Quality Previous: The Quality-Affecting Parameters   Contents   Index
Samir Mohamed 2003-01-08