Survey of European Broadcasters on MPEG-DASH

At the EBU’s BroadThinking 2013 event in March 2013, the DASH Industry Forum conducted a survey of European broadcasters on MPEG-DASH. Thirteen major broadcasters responded to the survey.

This report summarizes the results and examines if the DASH IF’s activities and focus is aligned with the needs expressed in the survey.

ButtleOFX: an Open Source Compositing Software

The aim of ButtleOFX is to create an open source compositing software based on the TuttleOFX library. TuttleOFX is an image and video processing framework based on the OpenFX open standard for visual effects plug-ins.

Color Grading

There are two things that are at the core of doing good color grading for video:

  • Ensuring that the image on the screen looks great and is graded to best tell the story.
  • Making sure that the image can be properly reproduced and delivered to a variety of media and screens.
This article will focus on the latter. Understanding the importance of legal and valid gamut and determining the color balance are critical to maintaining proper color reproduction across a variety of media and broadcast methods. Before we examine the concepts of color balance, let’s quickly review the concepts of color space.


The HSL Color Space
Video is comprised of three color components: red, green and blue (RGB). Various combinations of these colors make up the colors we see. One way to understand the Hue Saturation and Luma (HSL), or RGB color space, is to imagine it as two cones joined at their widest point.




The Waveform Monitor
The waveform monitor or rasterizer (scope) is key to providing a legal output of your creative product. Being a brilliant editor and colorist doesn’t mean much if no one will air your product. Even if your product isn’t being broadcast, legal levels affect the proper duplication of your project and the way it will look on a regular TV monitor.

One of the most basic uses of the waveform monitor is determining whether your luma and setup levels are legal. This means that the brightest part of the luma signal does not extend beyond 100 percent and that the darkest part of the picture does not drop below 0 percent.


Determining Color Balance
Color balance is indicated by the relative strength of each color channel. With neutral (pure black, white and gray) colors, the strength of each color channel should, technically, be equal.

The usual goal of color balancing is to achieve an image where neutral colors are represented with all channels being equal. The most common reason for unbalanced colors is how the camera is white balanced on location. For example, if a camera is set up to record in tungsten light, when it is actually capturing a scene lit with daylight, the blue channel will be stronger than the red and green channels.

Some camera sensors have a natural tendency to be more sensitive to certain colors in certain tonal ranges. These errors, in sensitivity or white balance, can be corrected by monitoring the image with a waveform monitor and making adjustments to the signal until the signal strength of all three channels is equal when displaying a neutral color.

There are two types of waveform displays colorists consult that are defined as “parade” displays because they show channels of information in a “parade,” from left to right. The most common of these is the RGB Parade shown in the following figure, which shows the red, green and blue channels of color information horizontally across the display.



The reference marks are referred to as the graticule. On a waveform monitor, these are the horizontal lines describing the millivolts, IRE or percentages from black to full power (white).

Component video levels are represented in terms of millivolts, with black being set at 0mV and white at 700mV. This range of video levels is also represented in terms of a percentage scale with 0 percent equal to 0mV, and 100 percent equal to 700mV.


The Vectorscope
Whereas a waveform monitor normally displays a plot of signal vs. time, a vectorscope, shown in the following figure, is an XY plot of color (hue) as an angular component of a polar display, with the signal amplitude represented by the distance from the center (black). On a vectorscope graticule, there are color targets and other markings that provide a reference as to which vector, or position, a specific color is in.



In color grading applications, the vectorscope helps analyze hue and chroma levels, keeping colors legal and helping to eliminate unwanted color casts. With the gain, setup and gamma corrections done while monitoring primarily the waveform monitor, the colorist’s attention focuses more on the vectorscope for the hue and chroma work.

The chroma strength of the signal is indicated by its distance from the center of the vectorscope. The closer the trace is to the outer edge of the vectorscope, the greater the chrominance, or the more vivid the color. The hue of the image is indicated by its rotational position around the circle. An important relationship to understand is the position of the various colors around the periphery of the vectorscope. The targets for red, blue and green form a triangle. In between each of these primary colors are the colors formed by mixing those primaries.

The chroma information presented on the vectorscope is instrumental in trying to eliminate color casts in images. As stated earlier, the chroma strength of a signal is represented by its distance from the center of the vectorscope. Because white, black and pure grays are devoid of chroma information, they all should sit neatly in the center of the vectorscope. Although most video images will have a range of colors, they also usually have some amount of whites, blacks and neutral grays. The key is to be able to see where these parts of the picture sit on the vectorscope and then use the color correction tools at your disposal to move them toward the center of the vectorscope.

For nearly all professional colorists, the various waveform displays — Flat, Low Pass, Luma only, RGB Parade and YCbCr Parade — plus the vectorscope are the main methods for analyzing their image. Although experienced colorists often rely on their eyes, they use these scopes to provide an unchanging reference to guide them as they spend hours color correcting. Without them, their eyes and grades would eventually drift off course. Spend time becoming comfortable with these scopes, and what part of the video image corresponds to the images on the scopes.

By Richard Duvall, Broadcast Engineering

HEVC Walkthrough

A walkthrough of some of the features of the new HEVC video compression standard. Video playback using the Osmo4 player from GPAC and stream analysis using Elecard's HEVC Analyzer software.



By Iain Richardson, Vcodex

Ethernet and IP

Ethernet and IP are terms broadcast engineers use many times every day. But what, exactly, is the difference between the two?

Layers
Many years ago, manufacturers needed to include, inside their application programs, software that was written to allow the program to interface to a specific network interface card. This meant that an application would only work with specific networking hardware; change the hardware, and you had to rewrite the application. Very quickly, vendors faced an escalating number of possible network devices and a number of different underlying physical network implementations (such as RG-11, RG-59 and UTP cable).

Manufacturers wanted to separate the development of their application from all of the chaos going on at the networking level. They also wanted to be able to sell their applications to different users who might have different networking technologies in their facilities. The seven-layer Open System Interconnection (OSI) data model was developed to address this issue in a standardized way. While it is not too important to understand all layers of the OSI model, it is good to understand the first four layers.


 
This shows the first four of seven layers to the OSI data model.


Layer 1 is the physical layer, sometimes referred to as PHY. This is the physical network hardware, and it operates at the bit level.

Layer 2 is called the data link layer and consists of a mixture of specifications describing the electrical signals on a cable and the way that data is encapsulated into frames. Ethernet is a Layer 2 protocol.

Layer 3 is referred to as the network layer, and it is here that data is encapsulated into packets. IP packets are referred to as datagrams.

Layer 4 is the transport layer. Here, we speak in terms of sessions. Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are Layer 4 protocols.

Each of these layers plays a role in networking. Importantly, Layer 2 is responsible for physical addressing, meaning that the network architecture can rely on the fact that a Layer 2 address is unique and permanently assigned to a specific piece of hardware.

Layer 3 is responsible for logical addressing, meaning that addresses are assigned at this level by a network engineer in a way that organizes network clients into logical groups. But, neither of these two layers are responsible for anything more than “best effort” delivery of packets from one place to another. If the packets get lost, duplicated, rearranged or corrupted, neither Layer 2 nor Layer 3 will do anything about it.

Layer 4 protocols are responsible for establishing end-to-end connections called sessions, and these protocols may or may not recognize the loss of packets and do something about it. TCP, for example, will request that lost packets be resent. UDP will not. (Remember that Ethernet operates at Layer 2, and IP operates at Layer 3.)

A Hierarchy
Ethernet and IP are part of a hierarchy of interchangeable technologies in the seven-layer OSI stack. While most of the networks that broadcast engineers are likely to encounter today are Ethernet and IP, it is important to understand that, in the early days, Ethernet was just one of a large number of competing Layer 2 technologies. Other Data Link Layer technologies include ATM, Token Ring and ARCNET.

While Layer 2 does a great job of organizing data into frames and passing them on to a physical network, it is not capable of allowing network architects to group computers into logical networks and allowing messages from those computers to be sent (or routed) to a much larger campus or even worldwide network. This is where Layer 3 comes in.

IP operating at Layer 3 organizes data into datagrams, which can be sent across any Layer 2 networking technology. IP datagrams are the same size and the same format, regardless of whether these packets are sent across Ethernet, Token Ring or some other network. In fact, you might be surprised to learn that IP packets from your computer may first travel over Ethernet, then over a SONET for long-distance transmission, and then be put back into Ethernet for delivery on a local network at the destination.

IP is not the only Layer 3 protocol in common use today. There are a number of other critical protocols that operate at this layer, many of which have to do with configuring and maintaining networks. Internet Control Message Protocol (ICMP) and Open Shortest Path First (OSPF) are two examples of this.

In summary, Ethernet and IP are part of a hierarchy. IP packets are carried in Ethernet frames. IP packets that travel over long distances are likely to be carried in the payload portion of SONET frames. Furthermore, when you see the notation TCP/IP, remember that TCP is also part of this hierarchy. It is highly likely that every time you see this notation used in reference to traffic on a LAN, remember that you are actually talking about TCP over IP over Ethernet.

Other Differences
There are many other differences between Ethernet and IP that are derived from the fact that they are on different layers of the OSI model.

Ethernet frames contain source and destination addresses. The frames also contain a payload area. IP packets are transported in the payload area of the Ethernet frames. The IP packets also contain source and destination addresses and a payload section. If both Ethernet and IP contain similar structures, why use Ethernet at all?

Remember that the point of adding IP as a layer on top of Ethernet is to allow the IP packet layout (and the application software above that layer) to remain constant while providing different underlying Layer 2 structures. Basically, if you change the network, then you change Layer 2 drivers, and perhaps Layer 1 hardware, but everything above that layer remains the same.

From a practical standpoint, there are significant differences between Ethernet addresses and IP addresses. Ethernet addresses are assigned to a network interface card or chip set at the factory. They are globally unique, cannot be changed (not true, actually, but this was the original assumption), and getting an Ethernet configuration up and running essentially is a plug-and-play process.

IP addresses, on the other hand, are not assigned from the factory. These addresses need to be configured (sometimes dynamically), and while Dynamic Host Configuration Protocol (DHCP) works very well most of the time to automatically configure IP addresses, there are many cases where broadcast engineers must manually configure the IP address of a device before it can be used on the network.

Another difference is that the first three octets of an Ethernet address convey meaning; they are an Organizationally Unique Identifier (OUI). It is possible to look up an OUI and determine who assigned the Ethernet address to the hardware. IP addresses, however, have absolutely no meaning assigned to them. That is not to say that there are not special IP address ranges because there are. But, the numbers themselves do not convey any specific information.

Perhaps the most important difference between Ethernet and IP is that Ethernet frames are not routable, while IP packets are. In practical terms, what this means is that an Ethernet network is limited in terms of the number of devices that can be connected to a single network segment and the distance Ethernet frames can travel. Limits vary, but, as an example, Gigabit Ethernet on Cat 7 cable is limited to about 330ft. Gigabit Ethernet on single mode fiber (expensive) is limited to 43mi.

In order to extend the network beyond this length, you need to be able to route signals from one physical Ethernet network to another. This was one of the original design goals of IP. Network architects assign computers to IP addresses based upon physical location and corporate function (news vs. production, for example), and then IP routers automatically recognize whether packets should remain within the local network or whether they need to be routed to the Internet.

By Brad Gilmer, Broadcast Engineering

Next Generation Video Compression

A whitepaper about HEVC by Ericsson.

Video Compression: More Bang with Fewer Bits

Video compression was one of the more ubiquitous topics at last month’s NAB Show. Perhaps equally ubiquitous was 4K and 8K. Seldom was one of either topic discussed without reference to the other.

Fortunately, in the world of digital television and digital video, a video-specific variation of Moore’s Law seems to be at work. With umpteen years of incremental NTSC picture quality progress that many viewers could barely see — if at all — the industry was ripe for the change to HDTV and self-contained big-screen displays.

Until Moore’s Law made it possible for common computers to handle digitized SD video, few analog engineers could envision what was in the future of the digital television. Some might remember the days of 1125/60 in the early 1990s. It was, more or less, the original analog HDTV format. Many design engineers were trying to compress it into the standard 6MHz broadcast television channel bandwidth using a variety of hardware-intensive systems. The industry even formed the 1125/60 Consortium before DTV was invented.

DTV, which was originally conceived to shoehorn HDTV into a standard 6MHz TV channel, has flooded the industry and viewing public with myriad unforeseen changes. Among many was progressive scan, streaming video, file-based video, the 1920 x 1080 raster and, more recently, 4K, 8K and beyond. While current DTV and ATSC transmission standards have capped the resolution TV stations can broadcast, computers, DVDs, the Internet and a host of video compression schemes and standards have eliminated barriers to delivering higher definition images to viewers from non-over-the-air sources. Current state-of-the-art video compression standards can reduce the bandwidth of baseband video by a factor of approximately 100.

The Codec
The first digital video codec standard was H.120. It was published in 1984 and revised in 1988, but the quality was so poor that there were very few users. H.120 was followed later in 1988 by H.261, an ITU-T video coding standard. The primary use for H.261 was for video transmission over ISDN lines at a resolution of 352 x 288 or 176 x 144.

MPEG-2, aka H.262, was first published in 1994. It paved the road to DTV, OTT and ATSC transmission, and it continues to be the standard used to create DVDs.

MPEG-4, aka H.264, was published in 2003 and is currently the most commonly used video codec and the standard for Blu-ray discs and HDTV.

Today, numerous compression standards are being used in a variety of ways to create and deliver content. The latest standards with the highest quality are JPEG 2000 (J2K) and more recently HEVC (High Efficiency Video Coding).

J2K
J2K is designed for compressing individual images, not video sequences. It is primarily used in production and video feeds with high bit rates up to approximately 120Mb/s. The higher bit rates make artifacts and blocking virtually invisible. It is hardware-intensive, and it can be accomplished in real time.

In J2K, each frame is compressed individually and, therefore, stands alone. In video compression terms, each individual complete frame is an I-Frame. While this feature is advantageous in maintaining video quality, its high bandwidth doesn’t lend itself well to distribution.

On April 15, 2013, the Video Services Forum issued a Technical Recommendation that defines profiles for streaming of JPEG 2000 Broadcast Profile in a MPEG-2 Transport Stream over IP with optional Forward Error Correction (FEC). The recommendation is for unidirectional transport of SD-SDI, HD-SDI and 3G-SDI signals, encapsulated in an RTP stream and transmitted via IP to a receiving device that will decode the output to an SDI signal.

HEVC
On the other hand, HEVC, aka H.265, and its modern predecessors H.264, MPEG-4, MPEG-2, Advanced Video Coding (AVC) use I-frames, P-frames and B-frames to encode moving images. To quickly review, the I in I-frame stands for Intra-coded and is a fully specified still image. The P in P-frame stands for Predicted picture, and it contains only the changes from the previous frame, saving unchanged data from having to be repeated. The B in B-frame stands for Bi-predictive. It saves more space than a P frame because it specifies the differences between it and the frame not just before it, but after it as well. Images are usually segmented into macroblocks, where prediction types can be determined based on the movement within each macroblock.

HEVC contains 33 directional modes for intra block prediction. MPEG-4 uses only eight directional modes. These modes use information from previously decoded neighboring prediction blocks. H.265 motion vector prediction is a 16-bit range for both H and V motion vectors (MVs) with quarter pixel precision. This gives HEVC a dynamic vector prediction range 16X greater than H.264.

Most pre-H.265 codecs independently encoded 16 x 16 pixel macroblocks. In HEVC, the image is split into Coding-Tree Units (CTUs), each up to 64 × 64 pixels. The root of a quadtree data structure. A Quadtree contains four branches that are used to partition a two-dimensional space which uses a recurring algorithm to subdivide it into four quadrants. The quadtree can then be sub-divided into leaf-level coding units (CUs), as illustrated in the following figure.

  The HEVC picture is split into Coding-Tree Units (CTUs) up to 64 x 64 pixels.
A CTU is the root of a quadtree data structure, which can then be sub-divided into
leaf-level Coding Units (CUs).


HEVC (H.265) represents the natural progression of Moore’s Law in video compression. In essence, the H.265 codec is twice as efficient as the H.264 codec. It increases the use of parallel processing, improves compressed video picture quality  and supports higher resolutions such as 8K Ultra high definition television (UHDTV).

As early as 2004, the ITU-T Video Coding Experts Group began work toward a new video compression standard to take H.264 to the next level. The Group was considering two approaches. One was to create extensions to H.264. The other was to create a new standard.

Three years later, the ISO/IEC Moving Picture Experts Group (MPEG) began work on a similar project called High-performance Video Coding. Its goal was to achieve a 50-percent bit-rate reduction without affecting subjective picture quality. The works of both groups evolved into the HEVC joint project in 2010.

In February 2012, a Committee Draft of the HEVC standard was written. In July, an International Standard was drafted. In January 2013, the Final Draft International Standard was introduced. One month before the Committee Draft was completed, Qualcomm demonstrated an HEVC decoder operating on a dual-core 1.5GHz Android tablet at the Mobile World Congress. In August, Ericsson showed the SVP 5500, the world’s first HEVC encoder at IBC. From that point until the opening of last month’s NAB, approximately 20 manufacturers introduced new encoders. Several more new HEVC systems were introduced last month at the NAB Show.

HEVC is the Future
As of now, HEVC is simply a bit-stream structure and syntax standard but not an MPEG Transport Stream, although that is expected to happen in the next few months. It will take about a year for HEVC to be reduced to a silicon chip set. Some are predicting HEVC will be incorporated into some new set-top boxes (STBs) by 2014 or 2015. This will likely be the first opportunity for proud owners of new 4K TVs to receive 4K programming.

H.265 may also be the successor to H.264 for the next generation of 4K DVDs. A standard dual-layer Blu-ray disk contains up to 50GB of data, enough for typical feature-length movies. Typical feature-length movies encoded with HEVC in 4K will contain up to 100GB.

Some experts seem to agree that 4K with HEVC codec most likely won’t significantly penetrate the market until at least 2017. Given the universal population and long life cycle of STBs, the administration of the wholesale replacement of existing STBs is staggering.

A similar situation exists for the population of ATSC HDTV sets in the United States. With the apparent failure of 3-D TV to catch on, many are questioning the eagerness of the market to replace their existing HDTV display devices as they did with the shutdown of NTSC. Thus, the future of HEVC in broadcast transmission is unknown.

On the other hand, if it turns out 4K doesn’t catch on with consumers, HEVC can double the number of existing HDTV channels using the same amount of bandwidth. Either way, over-the-air broadcast adoption of HEVC is dependent on upcoming ATSC 3.0 specifications.

By Ned Soseman, Broadcast Engineering

JPEG 2000 over IP

JPEG 2000 (J2K) is one of a number of compression formats that are used by professional media companies every day all over the world. J2K is generally used when high quality is required, such as for backhaul of national sporting events or for transfer of content between production facilities.

J2K can be configured to provide lossless compression, meaning that it is possible to prove that the video, after a compression/de-compression cycle, is mathematically identical in every way to the video prior to compression (lossless compression).

In the 1950s, AT&T built a nation-wide, terrestrial, video network for the big three networks. This was an RF-based analog system that remained in place for many years. In the 1960s, AT&T launched communications satellites, and AT&T and other satellite operators added video capability to these platforms over time. As a result, in the 1980s, satellite became the dominant long-haul technology.

During the dot-com boom, tens of thousands of miles of fiber optic cable were installed all over the country. The boom was followed by a bust, but the fiber was already in the ground. Thanks to this, megabit and now gigabit networking has become available on long-haul networks — at surprisingly reasonable prices in some cases.

One of the keys to networking is layering and encapsulation. Packetized networks use packets composed of a header and a payload section. The header contains information that is used to perform functions associated with that layer of the network functionality, and the payload section contains the information we want to transport across the network. Each layer performs a specific function. Let’s look at a specific example — the transport of J2K with audio over IP — to see how a layered approach is applied in a working scenario.



 
This shows the transport of J2K with audio over IP,
illustrating how a layered approach is applied in a working scenario.


We start with live professional video and audio — perhaps the output of a sports production truck. The video out of the truck is compressed using J2K, resulting in something called a JPEG 2000 Elementary Stream (ES). The audio at the side of the truck is already an AES stream.

Using MPEG-2
The JPEG standard says nothing about audio. Fortunately, we can use a portion of the MPEG-2 specification to multiplex JPEG-2000 ES and AES audio into a single MPEG-2 Transport Stream (TS) in a standardized way. This is an important point: The MPEG-2 specification covers all sorts of things besides compression. So, even though we feed this J2K video through equipment that is following the MPEG-2 specification, it is important to realize we are using J2K compression that is then fed into an MPEG-2 multiplexer, where it is combined with the AES audio. The result is a single MPEG-2 TS.

The MPEG-2 TS contains information that helps receivers reconstruct timing between video and audio streams. While this is vital to reproducing video and audio, these timestamps do not provide everything we need in order to deal with what happens in the real world on long-haul IP networks. Let’s look at some of these networks’ characteristics.

As IP packets travel over a network, they can take different paths from a sender to a receiver. Obviously, the inter-packet arrival time is going to change. In some cases, packets can arrive out of order or even be duplicated within the network. Having information about what has happened to packets as they transit the network allows smart receiver manufacturers to do all sorts of things in order to ensure that video and audio at the receive end are presented in a smooth stream. What we need is a way to embed information in the packets when they are transmitted, so that we can adjust for network behavior at the receiver.

RTP
Real-time Transport Protocol (RTP) allows manufacturers to insert precision time stamps and sequence numbers into packets at the transmitter. If we use these time stamps to indicate the precise time when the packets were launched, then at the receiver we can see trends across the network.

Is network delay increasing? What are the implications on buffer management at the receiver? This information allows receivers to adjust in order to produce the continuous stream at the output.

RTP sequence numbers are simply numbers that are inserted in the RTP header. The numbers increase sequentially. At a receiver, if you receive a packet stream in the order [1], [2], [4], [3], you know immediately that you need to reorder packets 3 and 4 in order to present the information to the MPEG-2 TS de-multiplexer in the order in which it was transmitted.

The next layer is User Datagram Protocol (UDP) encapsulation. MPEG-2 packets are 188 bytes. This data needs to be mapped into packets for transmission. The newly created SMPTE 2022-6 standard describes how to do this. UDP is designed to provide a simple scheme for building packets for network transmission. Transmission Control Protocol (TCP) is another alternative at this layer, but TCP is a much heavier implementation that, for a variety of reasons, is not well suited to professional live video transmission.

UDP packets are then encapsulated in IP datagrams, and at the IP layer, network source and destination addresses are then added. What this does is allow the network to route data from one location to another without the use of external routing control logic.

Finally, the IP datagrams are encapsulated in Ethernet packets. The Ethernet layer adds the specification of electrical and physical interfaces, in addition to Ethernet addressing that ties a specific physical device to an address, something IP addressing does not do.

Hopefully, this real-world example helps you to understand that layered systems are critical to the success of modern networked professional video, and that each layer adds something unique to the system.

By Brad Gilmer, Broadcast Engineering

Choosing JPEG 2000: The Growing Choice for Master File Format

Broadcasters, film studios and post-production houses are currently facing a major challenge in that the volume of generated video material is increasing dramatically. The result is a significant increase in the need for storage and archive capability.

Broadcasters and video archivists are also looking for long-term digital preservation. In most cases, the source material is not digital. Instead, it is on film that needs to be scanned or high-quality analog video tape.

A production and digital archive compression format, with no concessions in video content quality and the actual fabrication process, is the obvious choice — one that reduces storage costs compared to uncompressed video, while still maintaining indefinite protection from loss or damage. Such a format should preserve original quality, while also easily enabling the generation of most of the commonly used formats.

Several questions are frequent when selecting a format. What is the best physical long-term storage media for video content? What is a good candidate for a digital preservation? Can digital content be interpreted in the future? Various options are possible, and organizations have to decide carefully.

Today’s broadcasters understand the industry’s keywords: highest image quality, flexible delivery formats, interoperability and standardized profiles for optimal preservation. They also have a vested interest in a common high-end format to store, preserve and commercialize the avalanche of video footage generated globally. JPEG 2000 is the growing choice for master file format.


Digital Storage Keys
There are three keys to digital storage preservation:

  • Ensure continuous access to content over time. Archive and storage covers all activities necessary to ensure continued access to digital materials for as long as necessary. This includes strategies to ensure access to reformatted and digitally-born content, regardless of the risks of media failure and technological changes. Quality preservation is crucial. A conversion system of archived items is important for dissemination or distribution.

  • Everything that belongs together fits in one package. Archiving is an enduring process concerned with the impacts of changing technologies, whether it is the support of new media and data formats or a changing user community. “Long term” may extend indefinitely.

  • Use open, well-documented industry standards — no proprietary formats. Ideally, focus on standards recognized and used for archiving applications. Open-file formats are published specifications, usually maintained by standards organizations, which can therefore be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free/open-source softwares, using both types of software licenses. Open formats are also called free-file formats if they are not burdened by any copyrights, patents, trademarks or other restrictions. Anyone may use it at no cost for any desired purpose.

To standardize digital preservation practices and provide a set of recommendations for preservation program, the Reference Model for an Open Archival Information System (OAIS) was developed. OAIS is concerned with all technical aspects of a digital object’s life cycle: ingest into and storage in a preservation infrastructure, data management, accessibility and distribution. Continued interoperability is strategic; one needs easy and fast format conversion, as well as playback compatibility between manufacturers. For instance, a master file format must not be linked to any specific application, production format or major user.


JPEG 2000 in OP1a MXF
JPEG 2000 is based on Discrete Wavelet Transformation (DWT), scalar quantization, context modeling, arithmetic coding and post-compression rate allocation. JPEG 2000 provides random access (i.e., involving minimal decoding) to the block level in each sub-band, thus making it possible to decode a region, a low resolution or a low-quality image version without decoding the whole picture.



 
JPEG 2000 is based on discrete wavelet transformation, scalar quantization,
context modeling, arithmetic coding and post-compression rate allocation.


Functionally, JPEG 2000 is a true improvement that provides lossy and lossless compression, progressive and parseable code streams, error resilience, region of interest, proxies, random access and other features in one integrated algorithm.

In the video domain, JPEG 2000 is conceived as an intra-frame codec, so it closely matches the production workflow in which each video frame is treated as a single unit. Its ability to compress frame-by-frame has made it popular in the digital intermediate space in Hollywood. If the purpose of compression is the distribution of essence, and no further editing is expected, long-GOP MPEG will typically be preferred.

JPEG 2000 brings a storehouse of features to the broadcast process, whether ingest, transcoding, captioning, quality control or audio-track management is requested. Its inherent properties fully qualify it for high-quality, intermediate creation and masters archives. JPEG 2000 supports every resolution, color depth, number of components and frame rates; in short, the codec is future-proof.

The intra-frame quality of JPEG 2000 prevents error propagation over multiple frames and allows video signal edition at any given time. Two wavelet filters are included: the irreversible 9/7 and the fully reversible 5/3. The 5/3 wavelet filter offers a pure mathematically lossless compression, allowing an average 60-percent reduction in storage, while still allowing the exact original image information to be recovered. The 9/7 wavelet filter still performs visually lossless encoding. JPEG 2000 offers uncompressed quality, with no concession in video content quality and an important reduction in bandwidth and storage consumption.



 
JPEG 2000 can be fully lossless or bit-to-bit reversible. Also, the intra-frame quality
of the future-proof codec prevents error propagation over multiple frames.


Additionally, its scalability features a “create once, use many times” approach for a wide range of platforms. Easy transcoding of the codec appeals to high-end applications where workflows vastly benefit from transcoding to an intermediate version. JPEG 2000 ensures a clean, quick operation when bit-rate is at a premium.



 
Shown here are typical profiles in use for the JPEG 2000 MXF OP1a master for preservation.


Correctly transcoded HD1080p JPEG 2000 files compressed at 100Mb/s have been labeled “visually identical” to the 2K original footage by professional viewers. Furthermore, the wavelet-based JPEG 2000 compression does not interfere with the final — usually DCT-based — broadcast formats.

Post-production workflows consist of several encoding/decoding cycles. JPEG 2000 preserves the highest quality throughout this process, without any blocking artifacts creation. Moreover, all common bit depths, whether it is 8-bit, 10-bit, 12-bit or 16-bit, are supported.

Uniquely matching current industry needs, standardized broadcast profiles were adopted in 2010 (JPEG 2000 Part 1 Amd 3 – Profiles for Broadcast Application - ISO/IEC 15444-1:2004/Amd3), ensuring this wavelet-based codec its benchmark position in contribution, while fulfilling the industry-wide request for compression standards to archive and create mezzanine formats. A variety of media distribution channels can be transcoded.

The ongoing standardization process of the Interoperable Master Format (IMF) by SMPTE, focused on JPEG 2000 profiles, brings the adoption full-closure. The SMPTE standards also specify, in detail, how JPEG 2000 video data should be encapsulated in the widely adopted MXF.

Finally, a non-technical feature makes the JPEG 2000 open standard even more attractive for long-term projects; it is license- and royalty-free.


Other Codecs
Most other codecs are proprietary. Some have compliancy issues and several limitations to support any video formats or resolutions. The MPEG family is ideal for last-mile content delivery to viewers, but not for production and storage, since pictures have to be post-processed.



 
Looking at different parameters, JPEG 2000 appears to be ideal as a mezzanine file format.


Conclusion
JPEG 2000 has gained significant attraction as a mezzanine format. Open and well-documented, the codec is future-proof and extendable. That said, it is not surprising that the Library of Congress, France’s Institut National de l’Audiovisuel and several Hollywood studios, such as 20th Century Fox, have selected the codec for storage and preservation.

JPEG 2000 is a codec like no others. It gives users a superior quality, control and a unique flexibility of the image processing chain. The growing use of JPEG 2000 to archive and create mezzanine files, and the ongoing standardization process of the IMF based on JPEG 2000, are just a few of its advantages.

By Jean-Baptiste Lorent, Broadcast Engineering