Next: Simulator Description Up: Measuring Video Quality Previous: Measuring Video Quality Contents Index

Introduction

Many real-time video transmission applications over the Internet have appeared in the last few years. We find today: video phones, video conferencing, video streaming, tele-medical applications, distance learning, telepresence, video on demand, etc., with a different number of requirements in bandwidth and perceived quality. This gives rise to a need for assessing the quality of the transmitted video in real time. Traditionally, the quality is measured as a function of the encoding algorithms, without taking into account the effect of packet networks' parameters. It is now easy to transfer audio in real time over the Internet with reasonable quality. On the other hand, it is not easy to do the same for video. This is because real-time video demands greater bandwidth and, in general, dedicated hardware encoders due to the high complexity of video compression algorithms. In addition, in the case of networks with differentiated services, when audio and video streams are to be transmitted over the same network, audio streams are privileged over video streams when there is congestion. Thus, to guarantee reasonable video quality a dedicated network with sufficiently high bandwidth and QoS levels must be used. This solution is not always feasible due to the high costs of these equipments and the per-service prices. Another factor that makes transferring video more difficult than audio over public networks is the sensitivity to errors. Public networks are usually error prone. However, encoded video streams do not tolerate errors as audio, as video can tolerate only up to 4 % loss without error concealment. While, in speech, with up to 20 %, we get reasonable quality and still understand the talk. Due to these problems, users tend to tradeoff the factor of quality in some cases to deliver real-time video with small resolutions (SQCIF or QCIF) over the low-cost public Internet. The high complexity of video encoding and decoding processes with respect to those of speech and audio is proportional to the complexity of pure objective video quality measures. Due to the diversity of the encoding algorithms and their complexity, it is difficult to have objective quality measures that take into account all the possible coding and network impairments. In the literature, there are some objective quality measures that work well in some restricted situations, and fail in many others. In addition, as they operate pixel-by-pixel-wise and involve too much computations, the resulting latency and CPU uses limit their use in real-time applications, knowing that video encoding also demands very high computational power. Another weak point is that most of the available objective video quality measures require the access to both the original signal and the impaired one. This is impractical in the case of real-time transmission over packet networks. Now, if we consider the parameters that affect the quality (quality-affecting parameters) of video transmission over packet networks, we can classify them as follows:

Encoding and compression parameters: they control the amount of quality loss that happens during the encoding process; so they depend on the type of the encoding algorithm (MPEG, H26x, etc.), the output bit rate, the frame rate (the number of frames per sec.), etc. [53,82,153]. It should be noted that there are certain codecs' specific parameters that may affect video quality. For example, using MPEG codecs some parameters are the temporal relation among frame kinds (i.e. I, B, P frames) and the length of the group of pictures; while for the H263 codec (see Section 6.2), the bit rate control strategy (variable or constant) and hence the associated quantization parameter, the Intra MB refresh rate (or any other bit error resiliency method used, see Section 3.3.2 for details) are some parameters. Refer to Section 3.4 for details about video compression and the most known video codec characteristics.
Network parameters: they result from the packetization of the encoded video stream [111] and the transmission in real-time from the source to the destination. They include the following parameters: Packet loss rate resulting from dropping packets when the network is congested, Loss distribution (the burstiness of packet drop, or the number of consecutively lost packets), end-to-end delay resulting from the queuing of packets in the buffers of the traversed routers and the propagation delay and other factors, delay variation of the packets at the receiver(jitter), the type of FEC or any other method (if used) to protect against packet loss (see Section 3.3.2 in page for details) [53,147,146], etc. [58,23,29].
Other parameters: like the nature of the scene (e.g. amount of motion, color, contrast, image size, etc.), etc. In addition, the delay of the encoding and decoding in video application may degrade the quality if real time transmission is aimed.

Since we concentrate here only on pure video applications, we do not take into account parameters such as lip synchronization or other audio aspects. It is clear that quality is not linearly proportional to the variation of these parameters. The determination of the quality is a very complex problem, and there is no mathematical model that can take into account the effects of all these parameters. Hence, it is normal that, so far, there is no pure objective measure that can quantify the quality of all the effects of all these parameters altogether that can operate in all the situations. In Chapter 4, we outlined a new methodology to evaluate in real time the quality of any media type transmitted over packet networks in real time. We validated the result for speech in Chapter 5. We would like to apply our method in this Chapter to evaluate its applicability to the case of video. Thus, we show how Neural Networks (NN) can be used to mimic the way by which a group of human subjects assess video quality when this video is distorted by certain quality-affecting parameters (e.g. packet loss rate, loss distribution, bit rate, frame rate, encoded frame type, etc.). In order to illustrate its applicability, we chose to assess the quality of video flows transmitted over IP networks and we carried out subjective quality tests for video distorted by variations of those parameters. In this Section, we focus on how our approach can be applied to the case of pure video. (For the case when video is associated with other media types, speech or audio, see the end of Section 4.2 in page

.) We give here a summary of the general description for the case of video. The overall procedure is summarized as follows. First, we have to identify the quality-affecting parameters, their ranges and values. Then we have to choose the original video sequences and to produce the distorted video database. After that, the subjective quality test is to be carried out for the distorted video database and the MOS scores (together with their statistical analysis) are to be calculated in order to form the video quality database. Then, we select the NN architecture and learning algorithm. Then the NN is to be trained and tested using the video quality database. Finally, we implement the final tool that consists of the two modules (parameters collection and MOS evaluation), in order to obtain a real-time quality evaluator. In most cases, real-time video applications can be considered one-way sessions (i.e. they consist of a sender that produces the video and a receiver that consumes it). This behavior is different from that of audio applications. Indeed, in audio, the interactivity may imply some other parameters (e.g. echo, crosstalk effect, number of participating sources, etc.) that affect the overall quality [92]. The rest of this Chapter is organized as follows. In Section 6.2, we describe the simulator that we used to generate the video sequences required to identify the relation between the subjective quality measures and the parameters' values. A description of the quality-affecting parameters we considered in our experiment is given in Section 6.3. We outline the subjective quality test and the MOS calculation we performed in Section 6.4. The obtained results and the performance of the NN are given in Section 6.5. In Section 6.6, we analyze the performance of some of the existing objective video quality mesures. In Section 6.7, we present an application we implemented based on the obtained results to evaluate in real time the quality of any video sequence. Finally, we conclude our work in this Chapter in Section 6.8.

Next: Simulator Description Up: Measuring Video Quality Previous: Measuring Video Quality Contents Index

Samir Mohamed 2003-01-08