next up previous contents index
Next: ITS Video Quality Measure Up: State of the Art Previous: ITU E-model   Contents   Index


Objective Video Quality Techniques

Similar to speech, there are several objective video quality measures. They vary from a simple difference between original and distorted sequences to very complicated ones that are based on Human Vision System (HVS) models and include too complex mathematical calculations. All video quality measures we found in the literature and presented afterwards operate on both the original and the distorted video sequences. This limitation makes it impossible to work in real time and to include these metrics in designing new mechanisms (rate control or video codecs to take into account the user's perception and the network factors). A second disadvantage is that the obtained results do not always correlate with subjective data (thus they cannot measure correctly user's perception). A third drawback is that they are very computationally extensive, especially the ones built based on VHS model (some of them cannot be used to evaluate the quality for video sequences of length greater than 1 sec.). Some of these metrics are designed and optimized basically to consider encoding impairments and restricted conditions, but they do not work efficiently when they used in other conditions (ex. distortion due to the transmission over the network). Another observation is that not all the quality-affecting parameters can be considered. For example, the frame rate effect cannot be considered as they compare the original and distorted. This means that both sequences must have the same frame rate and the decoded picture of the processed sequence must correspond to the encoded picture in each frame of the original sequence, otherwise the results will degrade (cf. signals synchronization in the case of speech). The most commonly used objective quality metric is the Peak Signal to Noise Ratio (PSNR). For a video sequence of $K$ frames each having $N$x$M$ pixels with $m$-bit depth, first the Mean Square Error(MSE) is calculated as follows:

\begin{displaymath}
\mbox{MSE}={\frac{1}{N.M.K}\displaystyle \sum_{k=1}^K \disp...
...}^N
\displaystyle \sum_{m=1}^M [x(i,j,k)-\bar{x}(i,j,k)]^2}, \end{displaymath}

where $x(i,j,k)$ and $\bar{x}(i,j,k)$ are the pixel luminance value in the $i,j$ location in the $k$ frame for the original and distorted sequences respectively. The Root MSE (RMSE) is calculated using $\mbox{RMSE}=\sqrt{\mbox{MSE}}$. The PSNR can be calculated as follows:

\begin{displaymath}
\mbox{PSNR}=10.\log\frac{m^2}{RMSE^2}. \end{displaymath}

MSE and RMSE measure the difference between the original and distorted sequences. PSNR measures the fidelity (how close a sequence is similar to an original one). Compared to other objective measures, PSNR is easy to compute and well understood by most researchers. However, the correlation with subjective measure is bad. In addition, the above measures consider only the luminance component, and neglect the chrominance one, which is important for human perception. Among the available objective video quality measures in the literature, there are the ITS and EPFL metrics. The first one is developed in the ``Institute for Telecommunication Sciences'', from which comes the acronym ITS. The other ones are developed in ``École Polytechnique Fédérale de Lausanne (EPFL)''. We describe these metrics in this Section. Regarding the performance evaluation of these methods, see Section 6.6.

Subsections
next up previous contents index
Next: ITS Video Quality Measure Up: State of the Art Previous: ITU E-model   Contents   Index
Samir Mohamed 2003-01-08