MPEG Analysis and Measurement

Broadcast engineering requires a unique set of skills and talents. Some audio engineers claim the ability to hear the difference between tiny nuisances such as different kinds of speaker wire. They are known as those with golden ears. Their video engineering counterparts can spot and obsess over a single deviate pixel during a Super Bowl touchdown pass or a “Leave it to Beaver” rerun in real time. They are known as eagle eyes or video experts.

Not all audio and video engineers are blessed with super-senses. Nor do we all have the talent to focus our brain’s undivided processing power to discover and discern vague, cryptic and sometimes immeasurable sound or image anomalies with our bare eyes or ears on the fly, me included. Sometimes, the message can overpower the media. Fortunately for us and thanks to the Internet and digital video, more objective quality and measurement standards and tools have developed.

One of those standards is Perceptual Evaluation of Video Quality (PEVQ). It is an End-to-End (E2E) measurement algorithm standard that grades picture quality of a video presentation by a five-point Mean Opinion Score (MOS), one being bad and five being excellent.

PEVQ can be used to analyze visible artifacts caused by digital video encoding/decoding or transcoding processes, RF- or IP-based transmission systems and viewer devices like set-top boxes. PEVQ is suited for next-generation networking and mobile services and include SD and HD IPTV, streaming video, mobile TV, video conferencing and video messaging.

The development for PEVQ began with still images. Evaluation models were later expanded to include motion video. PEVQ can be used to assess degradations of a decoded video stream from the network, such as that received by a TV set-top box, in comparison to the original reference picture as broadcast from the studio. This evaluation model is referred to as End-to-End (E2E) quality testing.

E2E exactly replicates how so-called average viewers would evaluate the video quality based on subjective comparison, so it addresses Quality-of-Experience (QoE) testing. PEVQ is based on modeling human visual behaviors. It is a full-reference algorithm that analyzes the picture pixel-by-pixel after a temporal alignment of corresponding frames of reference and test signal.

Besides an overall quality Mean Opinion Score figure of merit, abnormalities in the video signal are quantified by several Key Performance Indicators (KPI), such as Peak Signal-to-Noise Ratios (PSNR), distortion indicators and lip-sync delay.

PVEQ References
Depending on the data made available to the algorithm, video quality test algorithms can be divided into three categories based on available reference data.

A Full Reference (FR) algorithm has access to and makes use of the original reference sequence for a comparative difference analysis. It compares each pixel of the reference sequence to each corresponding pixel of the received sequence. FR measurements deliver the highest accuracy and repeatability but are processing intensive.

A Reduced Reference (RR) algorithm uses a reduced bandwidth side channel between the sender and the receiver, which is not capable of transmitting the full reference signal. Instead, parameters are extracted at the sending side, which help predict the quality at the receiving end. RR measurements are less accurate than FR and represent a working compromise if bandwidth for the reference signal is limited.

A No Reference (NR) algorithm only uses the degraded signal for the quality estimation and has no information of the original reference sequence. NR algorithms are low accuracy estimates only, because the original quality of the source reference is unknown. A common variant at the upper end of NR algorithms analyzes the stream at the packet level, but not the decoded video at the pixel level. The measurement is consequently limited to a transport stream analysis.

Another widely used MOS algorithm is VQmon. This algorithm was recently updated to VQmon for Streaming Video. It performs real-time analysis of video streamed using the key Adobe, Apple and Microsoft streaming protocols, analyzes video quality and buffering performance and reports detailed performance and QoE metrics. It uses packet/frame-based zero reference, with fast performance that enables real-time analysis on the impact that loss of I, B and P frames has on the content, both encrypted and unencrypted.

The 411 on MDI
The Media Delivery Index (MDI) measurement is specifically designed to monitor networks that are sensitive to arrival time and packet loss such as MPEG-2 video streams, and is described by the Internet Engineering Task Force document RFC 4445. It measures key video network performance metrics, including jitter, nominal flow rate deviations and instant data loss events for a particular stream.

MDI provides information to detect virtually all network-related impairments for streaming video, and it enables the measurement of jitter on fixed and variable bit-rate IP streams. MDI is typically shown as the ratio of the Delay Factor (DF) to the Media Loss Rate (MLR), i.e. DF:MLR.

DF is the number of milliseconds of streaming data that buffers must handle to eliminate jitter, something like a time-base corrector once did for baseband video. It is determined by first calculating the MDI virtual buffer depth of each packet as it arrives. In video streams, this value is sometimes called the Instantaneous Flow Rate (IFR). When calculating DF, it is known as DELTA.

To determine DF, DELTA is monitored to identify maximum and minimum virtual depths over time. Usually one or two seconds is enough time. The difference between maximum and minimum DELTA divided by the stream rate reveals the DF. In video streams, the difference is sometimes called the Instantaneous Flow Rate Deviation (IFRD). DF values less than 50ms are usually considered acceptable. An excellent white paper with much more detail on MDI is available from Agilent.


Figure 1 - The Delay Factor (DF) dictates buffer size needed to eliminate jitter


Using the formula in Figure 1, let’s say a 3.Mb/s MPEG video stream observed over a one-second interval feeds a maximum data rate into a virtual buffer of 3.005Mb and a low of 2.995Mb. The difference is the DF, which in this case is 10Kb. DF divided by the stream rate reveals the buffer requirements. In this case, 10K divided by 3.Mb/s is 3.333 milliseconds. Thus, to avoid packet loss in the presence of the known jitter, the receiver’s buffer must be 15kb, which at a 3Mb rate injects 4 milliseconds of delay. A device with an MDI rating of 4:0.003, for example, would indicate that the device has a 4 millisecond DF and a MLR of 0.003 media packets per second.

The MLR formula in Figure 2 is computed by dividing the number of lost or out-of-order media packets by observed time in seconds. Out-of-order packets are crucial because many devices don’t reorder packets before handing them to the decoder. The best-case MLR is zero. The minimum acceptable MLR for HDTV is generally considered to be less than 0.0005. An MLR greater than zero adds time for viewing devices to lock into the higher MLR, which slows channel surfing an can introduce various ongoing anomalies when locked in.


Figure 2 - The Media Loss Rate (MLR) is used in the Media Delivery Index (MDI)


Watch That Jitter
Just as too much coffee can make you jittery, heavy traffic can make a network jittery, and jitter is a major source of video-related IP problems. Pro-actively monitoring jitter can alert you to help avert impending QoE issues before they occur.

One way to overload a MPEG-2 stream is with excessive bursts. Packet bursts can cause a network-level or a set-top box buffer to overflow or under-run, resulting in lost packets or empty buffers, which cause macro blocking or black/freeze frame conditions, respectively. An overload of metadata such as video content PIDs can contribute to this problem.

Probing a streaming media network at various nodes and under different load conditions makes it possible to isolate and identify devices or bottlenecks that introduce significant jitter or packet loss to the transport stream. Deviations from nominal jitter or data loss benchmarks are indicative of an imminent or ongoing fault condition.

QoE is one of many subjective measurements used to determine how well a broadcaster’s signal, whether on-air, online or on-demand, satisfies the viewer’s perception of the sights and sounds as they are reproduced at his or her location. I can’t help but find some humor in the idea that the ones-and-zeros of a digital video stream can be rated on a gray scale of 1-5 for quality.

Experienced broadcast engineers know the so-called quality of a digital image begins well before the light enters lens, and with apologies to our friends in the broadcast camera lens business, the image is pre-distorted to some degree within the optical system before the photons hit the image sensors.

QoE or RST?
A scale of 1-5 is what ham radio operators have used for 100 years in the readability part of the Readability, Strength and Tone (RST) code system. While signal strength (S) could be objectively measured with an S-meter such as shown in Figure 3, readability (R) was purely subjective, and tone (T) could be subjective, objective or both.


Figure 3 - The S-meter was the first commonly used metric to objectively
read and report signal strength at an RF receive site


Engineers and hams know that as S and or T diminish, R follows, but that minimum acceptable RST values depend almost entirely on the minimum R figure the viewer or listener is willing to accept. In analog times, the minimum acceptable R figure often varied with the value of the message.

Digital technology and transport removes the viewer or listener’s subjective reception opinion from the loop. Digital video and audio is either as perfect as the originator intended or practically useless. We don’t need a committee to tell us that. It seems to me the digital cliff falls just south of a 4x5x8 RST. Your opinion may vary.

By Ned Soseman, Broadcast Engineering