At the EBU’s BroadThinking 2013 event in March 2013, the DASH Industry Forum conducted a survey of European broadcasters on MPEG-DASH. Thirteen major broadcasters responded to the survey.
This report summarizes the results and examines if the DASH IF’s activities and focus is aligned with the needs expressed in the survey.
At the EBU’s BroadThinking 2013 event in March 2013, the DASH Industry Forum conducted a survey of European broadcasters on MPEG-DASH. Thirteen major broadcasters responded to the survey.
The aim of ButtleOFX is to create an open source compositing software based on the TuttleOFX library. TuttleOFX is an image and video processing framework based on the OpenFX open standard for visual effects plug-ins.
Monday, May 20, 2013
There are two things that are at the core of doing good color grading for video:
- Ensuring that the image on the screen looks great and is graded to best tell the story.
- Making sure that the image can be properly reproduced and delivered to a variety of media and screens.
The HSL Color Space
Video is comprised of three color components: red, green and blue (RGB). Various combinations of these colors make up the colors we see. One way to understand the Hue Saturation and Luma (HSL), or RGB color space, is to imagine it as two cones joined at their widest point.
The Waveform Monitor
The waveform monitor or rasterizer (scope) is key to providing a legal output of your creative product. Being a brilliant editor and colorist doesn’t mean much if no one will air your product. Even if your product isn’t being broadcast, legal levels affect the proper duplication of your project and the way it will look on a regular TV monitor.
One of the most basic uses of the waveform monitor is determining whether your luma and setup levels are legal. This means that the brightest part of the luma signal does not extend beyond 100 percent and that the darkest part of the picture does not drop below 0 percent.
Determining Color Balance
Color balance is indicated by the relative strength of each color channel. With neutral (pure black, white and gray) colors, the strength of each color channel should, technically, be equal.
The usual goal of color balancing is to achieve an image where neutral colors are represented with all channels being equal. The most common reason for unbalanced colors is how the camera is white balanced on location. For example, if a camera is set up to record in tungsten light, when it is actually capturing a scene lit with daylight, the blue channel will be stronger than the red and green channels.
Some camera sensors have a natural tendency to be more sensitive to certain colors in certain tonal ranges. These errors, in sensitivity or white balance, can be corrected by monitoring the image with a waveform monitor and making adjustments to the signal until the signal strength of all three channels is equal when displaying a neutral color.
There are two types of waveform displays colorists consult that are defined as “parade” displays because they show channels of information in a “parade,” from left to right. The most common of these is the RGB Parade shown in the following figure, which shows the red, green and blue channels of color information horizontally across the display.
The reference marks are referred to as the graticule. On a waveform monitor, these are the horizontal lines describing the millivolts, IRE or percentages from black to full power (white).
Component video levels are represented in terms of millivolts, with black being set at 0mV and white at 700mV. This range of video levels is also represented in terms of a percentage scale with 0 percent equal to 0mV, and 100 percent equal to 700mV.
Whereas a waveform monitor normally displays a plot of signal vs. time, a vectorscope, shown in the following figure, is an XY plot of color (hue) as an angular component of a polar display, with the signal amplitude represented by the distance from the center (black). On a vectorscope graticule, there are color targets and other markings that provide a reference as to which vector, or position, a specific color is in.
In color grading applications, the vectorscope helps analyze hue and chroma levels, keeping colors legal and helping to eliminate unwanted color casts. With the gain, setup and gamma corrections done while monitoring primarily the waveform monitor, the colorist’s attention focuses more on the vectorscope for the hue and chroma work.
The chroma strength of the signal is indicated by its distance from the center of the vectorscope. The closer the trace is to the outer edge of the vectorscope, the greater the chrominance, or the more vivid the color. The hue of the image is indicated by its rotational position around the circle. An important relationship to understand is the position of the various colors around the periphery of the vectorscope. The targets for red, blue and green form a triangle. In between each of these primary colors are the colors formed by mixing those primaries.
The chroma information presented on the vectorscope is instrumental in trying to eliminate color casts in images. As stated earlier, the chroma strength of a signal is represented by its distance from the center of the vectorscope. Because white, black and pure grays are devoid of chroma information, they all should sit neatly in the center of the vectorscope. Although most video images will have a range of colors, they also usually have some amount of whites, blacks and neutral grays. The key is to be able to see where these parts of the picture sit on the vectorscope and then use the color correction tools at your disposal to move them toward the center of the vectorscope.
For nearly all professional colorists, the various waveform displays — Flat, Low Pass, Luma only, RGB Parade and YCbCr Parade — plus the vectorscope are the main methods for analyzing their image. Although experienced colorists often rely on their eyes, they use these scopes to provide an unchanging reference to guide them as they spend hours color correcting. Without them, their eyes and grades would eventually drift off course. Spend time becoming comfortable with these scopes, and what part of the video image corresponds to the images on the scopes.
By Richard Duvall, Broadcast Engineering
Wednesday, May 15, 2013
A walkthrough of some of the features of the new HEVC video compression standard. Video playback using the Osmo4 player from GPAC and stream analysis using Elecard's HEVC Analyzer software.
By Iain Richardson, Vcodex
Monday, May 13, 2013
Ethernet and IP are terms broadcast engineers use many times every day. But what, exactly, is the difference between the two?
Many years ago, manufacturers needed to include, inside their application programs, software that was written to allow the program to interface to a specific network interface card. This meant that an application would only work with specific networking hardware; change the hardware, and you had to rewrite the application. Very quickly, vendors faced an escalating number of possible network devices and a number of different underlying physical network implementations (such as RG-11, RG-59 and UTP cable).
Manufacturers wanted to separate the development of their application from all of the chaos going on at the networking level. They also wanted to be able to sell their applications to different users who might have different networking technologies in their facilities. The seven-layer Open System Interconnection (OSI) data model was developed to address this issue in a standardized way. While it is not too important to understand all layers of the OSI model, it is good to understand the first four layers.
Layer 1 is the physical layer, sometimes referred to as PHY. This is the physical network hardware, and it operates at the bit level.
Layer 2 is called the data link layer and consists of a mixture of specifications describing the electrical signals on a cable and the way that data is encapsulated into frames. Ethernet is a Layer 2 protocol.
Layer 3 is referred to as the network layer, and it is here that data is encapsulated into packets. IP packets are referred to as datagrams.
Layer 4 is the transport layer. Here, we speak in terms of sessions. Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are Layer 4 protocols.
Each of these layers plays a role in networking. Importantly, Layer 2 is responsible for physical addressing, meaning that the network architecture can rely on the fact that a Layer 2 address is unique and permanently assigned to a specific piece of hardware.
Layer 3 is responsible for logical addressing, meaning that addresses are assigned at this level by a network engineer in a way that organizes network clients into logical groups. But, neither of these two layers are responsible for anything more than “best effort” delivery of packets from one place to another. If the packets get lost, duplicated, rearranged or corrupted, neither Layer 2 nor Layer 3 will do anything about it.
Layer 4 protocols are responsible for establishing end-to-end connections called sessions, and these protocols may or may not recognize the loss of packets and do something about it. TCP, for example, will request that lost packets be resent. UDP will not. (Remember that Ethernet operates at Layer 2, and IP operates at Layer 3.)
Ethernet and IP are part of a hierarchy of interchangeable technologies in the seven-layer OSI stack. While most of the networks that broadcast engineers are likely to encounter today are Ethernet and IP, it is important to understand that, in the early days, Ethernet was just one of a large number of competing Layer 2 technologies. Other Data Link Layer technologies include ATM, Token Ring and ARCNET.
While Layer 2 does a great job of organizing data into frames and passing them on to a physical network, it is not capable of allowing network architects to group computers into logical networks and allowing messages from those computers to be sent (or routed) to a much larger campus or even worldwide network. This is where Layer 3 comes in.
IP operating at Layer 3 organizes data into datagrams, which can be sent across any Layer 2 networking technology. IP datagrams are the same size and the same format, regardless of whether these packets are sent across Ethernet, Token Ring or some other network. In fact, you might be surprised to learn that IP packets from your computer may first travel over Ethernet, then over a SONET for long-distance transmission, and then be put back into Ethernet for delivery on a local network at the destination.
IP is not the only Layer 3 protocol in common use today. There are a number of other critical protocols that operate at this layer, many of which have to do with configuring and maintaining networks. Internet Control Message Protocol (ICMP) and Open Shortest Path First (OSPF) are two examples of this.
In summary, Ethernet and IP are part of a hierarchy. IP packets are carried in Ethernet frames. IP packets that travel over long distances are likely to be carried in the payload portion of SONET frames. Furthermore, when you see the notation TCP/IP, remember that TCP is also part of this hierarchy. It is highly likely that every time you see this notation used in reference to traffic on a LAN, remember that you are actually talking about TCP over IP over Ethernet.
There are many other differences between Ethernet and IP that are derived from the fact that they are on different layers of the OSI model.
Ethernet frames contain source and destination addresses. The frames also contain a payload area. IP packets are transported in the payload area of the Ethernet frames. The IP packets also contain source and destination addresses and a payload section. If both Ethernet and IP contain similar structures, why use Ethernet at all?
Remember that the point of adding IP as a layer on top of Ethernet is to allow the IP packet layout (and the application software above that layer) to remain constant while providing different underlying Layer 2 structures. Basically, if you change the network, then you change Layer 2 drivers, and perhaps Layer 1 hardware, but everything above that layer remains the same.
From a practical standpoint, there are significant differences between Ethernet addresses and IP addresses. Ethernet addresses are assigned to a network interface card or chip set at the factory. They are globally unique, cannot be changed (not true, actually, but this was the original assumption), and getting an Ethernet configuration up and running essentially is a plug-and-play process.
IP addresses, on the other hand, are not assigned from the factory. These addresses need to be configured (sometimes dynamically), and while Dynamic Host Configuration Protocol (DHCP) works very well most of the time to automatically configure IP addresses, there are many cases where broadcast engineers must manually configure the IP address of a device before it can be used on the network.
Another difference is that the first three octets of an Ethernet address convey meaning; they are an Organizationally Unique Identifier (OUI). It is possible to look up an OUI and determine who assigned the Ethernet address to the hardware. IP addresses, however, have absolutely no meaning assigned to them. That is not to say that there are not special IP address ranges because there are. But, the numbers themselves do not convey any specific information.
Perhaps the most important difference between Ethernet and IP is that Ethernet frames are not routable, while IP packets are. In practical terms, what this means is that an Ethernet network is limited in terms of the number of devices that can be connected to a single network segment and the distance Ethernet frames can travel. Limits vary, but, as an example, Gigabit Ethernet on Cat 7 cable is limited to about 330ft. Gigabit Ethernet on single mode fiber (expensive) is limited to 43mi.
In order to extend the network beyond this length, you need to be able to route signals from one physical Ethernet network to another. This was one of the original design goals of IP. Network architects assign computers to IP addresses based upon physical location and corporate function (news vs. production, for example), and then IP routers automatically recognize whether packets should remain within the local network or whether they need to be routed to the Internet.
By Brad Gilmer, Broadcast Engineering
Monday, May 06, 2013
A whitepaper about HEVC by Ericsson.
Video compression was one of the more ubiquitous topics at last month’s NAB Show. Perhaps equally ubiquitous was 4K and 8K. Seldom was one of either topic discussed without reference to the other.
Fortunately, in the world of digital television and digital video, a video-specific variation of Moore’s Law seems to be at work. With umpteen years of incremental NTSC picture quality progress that many viewers could barely see — if at all — the industry was ripe for the change to HDTV and self-contained big-screen displays.
Until Moore’s Law made it possible for common computers to handle digitized SD video, few analog engineers could envision what was in the future of the digital television. Some might remember the days of 1125/60 in the early 1990s. It was, more or less, the original analog HDTV format. Many design engineers were trying to compress it into the standard 6MHz broadcast television channel bandwidth using a variety of hardware-intensive systems. The industry even formed the 1125/60 Consortium before DTV was invented.
DTV, which was originally conceived to shoehorn HDTV into a standard 6MHz TV channel, has flooded the industry and viewing public with myriad unforeseen changes. Among many was progressive scan, streaming video, file-based video, the 1920 x 1080 raster and, more recently, 4K, 8K and beyond. While current DTV and ATSC transmission standards have capped the resolution TV stations can broadcast, computers, DVDs, the Internet and a host of video compression schemes and standards have eliminated barriers to delivering higher definition images to viewers from non-over-the-air sources. Current state-of-the-art video compression standards can reduce the bandwidth of baseband video by a factor of approximately 100.
The first digital video codec standard was H.120. It was published in 1984 and revised in 1988, but the quality was so poor that there were very few users. H.120 was followed later in 1988 by H.261, an ITU-T video coding standard. The primary use for H.261 was for video transmission over ISDN lines at a resolution of 352 x 288 or 176 x 144.
MPEG-2, aka H.262, was first published in 1994. It paved the road to DTV, OTT and ATSC transmission, and it continues to be the standard used to create DVDs.
MPEG-4, aka H.264, was published in 2003 and is currently the most commonly used video codec and the standard for Blu-ray discs and HDTV.
Today, numerous compression standards are being used in a variety of ways to create and deliver content. The latest standards with the highest quality are JPEG 2000 (J2K) and more recently HEVC (High Efficiency Video Coding).
J2K is designed for compressing individual images, not video sequences. It is primarily used in production and video feeds with high bit rates up to approximately 120Mb/s. The higher bit rates make artifacts and blocking virtually invisible. It is hardware-intensive, and it can be accomplished in real time.
In J2K, each frame is compressed individually and, therefore, stands alone. In video compression terms, each individual complete frame is an I-Frame. While this feature is advantageous in maintaining video quality, its high bandwidth doesn’t lend itself well to distribution.
On April 15, 2013, the Video Services Forum issued a Technical Recommendation that defines profiles for streaming of JPEG 2000 Broadcast Profile in a MPEG-2 Transport Stream over IP with optional Forward Error Correction (FEC). The recommendation is for unidirectional transport of SD-SDI, HD-SDI and 3G-SDI signals, encapsulated in an RTP stream and transmitted via IP to a receiving device that will decode the output to an SDI signal.
On the other hand, HEVC, aka H.265, and its modern predecessors H.264, MPEG-4, MPEG-2, Advanced Video Coding (AVC) use I-frames, P-frames and B-frames to encode moving images. To quickly review, the I in I-frame stands for Intra-coded and is a fully specified still image. The P in P-frame stands for Predicted picture, and it contains only the changes from the previous frame, saving unchanged data from having to be repeated. The B in B-frame stands for Bi-predictive. It saves more space than a P frame because it specifies the differences between it and the frame not just before it, but after it as well. Images are usually segmented into macroblocks, where prediction types can be determined based on the movement within each macroblock.
HEVC contains 33 directional modes for intra block prediction. MPEG-4 uses only eight directional modes. These modes use information from previously decoded neighboring prediction blocks. H.265 motion vector prediction is a 16-bit range for both H and V motion vectors (MVs) with quarter pixel precision. This gives HEVC a dynamic vector prediction range 16X greater than H.264.
Most pre-H.265 codecs independently encoded 16 x 16 pixel macroblocks. In HEVC, the image is split into Coding-Tree Units (CTUs), each up to 64 × 64 pixels. The root of a quadtree data structure. A Quadtree contains four branches that are used to partition a two-dimensional space which uses a recurring algorithm to subdivide it into four quadrants. The quadtree can then be sub-divided into leaf-level coding units (CUs), as illustrated in the following figure.
HEVC (H.265) represents the natural progression of Moore’s Law in video compression. In essence, the H.265 codec is twice as efficient as the H.264 codec. It increases the use of parallel processing, improves compressed video picture quality and supports higher resolutions such as 8K Ultra high definition television (UHDTV).
As early as 2004, the ITU-T Video Coding Experts Group began work toward a new video compression standard to take H.264 to the next level. The Group was considering two approaches. One was to create extensions to H.264. The other was to create a new standard.
Three years later, the ISO/IEC Moving Picture Experts Group (MPEG) began work on a similar project called High-performance Video Coding. Its goal was to achieve a 50-percent bit-rate reduction without affecting subjective picture quality. The works of both groups evolved into the HEVC joint project in 2010.
In February 2012, a Committee Draft of the HEVC standard was written. In July, an International Standard was drafted. In January 2013, the Final Draft International Standard was introduced. One month before the Committee Draft was completed, Qualcomm demonstrated an HEVC decoder operating on a dual-core 1.5GHz Android tablet at the Mobile World Congress. In August, Ericsson showed the SVP 5500, the world’s first HEVC encoder at IBC. From that point until the opening of last month’s NAB, approximately 20 manufacturers introduced new encoders. Several more new HEVC systems were introduced last month at the NAB Show.
HEVC is the Future
As of now, HEVC is simply a bit-stream structure and syntax standard but not an MPEG Transport Stream, although that is expected to happen in the next few months. It will take about a year for HEVC to be reduced to a silicon chip set. Some are predicting HEVC will be incorporated into some new set-top boxes (STBs) by 2014 or 2015. This will likely be the first opportunity for proud owners of new 4K TVs to receive 4K programming.
H.265 may also be the successor to H.264 for the next generation of 4K DVDs. A standard dual-layer Blu-ray disk contains up to 50GB of data, enough for typical feature-length movies. Typical feature-length movies encoded with HEVC in 4K will contain up to 100GB.
Some experts seem to agree that 4K with HEVC codec most likely won’t significantly penetrate the market until at least 2017. Given the universal population and long life cycle of STBs, the administration of the wholesale replacement of existing STBs is staggering.
A similar situation exists for the population of ATSC HDTV sets in the United States. With the apparent failure of 3-D TV to catch on, many are questioning the eagerness of the market to replace their existing HDTV display devices as they did with the shutdown of NTSC. Thus, the future of HEVC in broadcast transmission is unknown.
On the other hand, if it turns out 4K doesn’t catch on with consumers, HEVC can double the number of existing HDTV channels using the same amount of bandwidth. Either way, over-the-air broadcast adoption of HEVC is dependent on upcoming ATSC 3.0 specifications.
By Ned Soseman, Broadcast Engineering
JPEG 2000 (J2K) is one of a number of compression formats that are used by professional media companies every day all over the world. J2K is generally used when high quality is required, such as for backhaul of national sporting events or for transfer of content between production facilities.
J2K can be configured to provide lossless compression, meaning that it is possible to prove that the video, after a compression/de-compression cycle, is mathematically identical in every way to the video prior to compression (lossless compression).
In the 1950s, AT&T built a nation-wide, terrestrial, video network for the big three networks. This was an RF-based analog system that remained in place for many years. In the 1960s, AT&T launched communications satellites, and AT&T and other satellite operators added video capability to these platforms over time. As a result, in the 1980s, satellite became the dominant long-haul technology.
During the dot-com boom, tens of thousands of miles of fiber optic cable were installed all over the country. The boom was followed by a bust, but the fiber was already in the ground. Thanks to this, megabit and now gigabit networking has become available on long-haul networks — at surprisingly reasonable prices in some cases.
One of the keys to networking is layering and encapsulation. Packetized networks use packets composed of a header and a payload section. The header contains information that is used to perform functions associated with that layer of the network functionality, and the payload section contains the information we want to transport across the network. Each layer performs a specific function. Let’s look at a specific example — the transport of J2K with audio over IP — to see how a layered approach is applied in a working scenario.
We start with live professional video and audio — perhaps the output of a sports production truck. The video out of the truck is compressed using J2K, resulting in something called a JPEG 2000 Elementary Stream (ES). The audio at the side of the truck is already an AES stream.
The JPEG standard says nothing about audio. Fortunately, we can use a portion of the MPEG-2 specification to multiplex JPEG-2000 ES and AES audio into a single MPEG-2 Transport Stream (TS) in a standardized way. This is an important point: The MPEG-2 specification covers all sorts of things besides compression. So, even though we feed this J2K video through equipment that is following the MPEG-2 specification, it is important to realize we are using J2K compression that is then fed into an MPEG-2 multiplexer, where it is combined with the AES audio. The result is a single MPEG-2 TS.
The MPEG-2 TS contains information that helps receivers reconstruct timing between video and audio streams. While this is vital to reproducing video and audio, these timestamps do not provide everything we need in order to deal with what happens in the real world on long-haul IP networks. Let’s look at some of these networks’ characteristics.
As IP packets travel over a network, they can take different paths from a sender to a receiver. Obviously, the inter-packet arrival time is going to change. In some cases, packets can arrive out of order or even be duplicated within the network. Having information about what has happened to packets as they transit the network allows smart receiver manufacturers to do all sorts of things in order to ensure that video and audio at the receive end are presented in a smooth stream. What we need is a way to embed information in the packets when they are transmitted, so that we can adjust for network behavior at the receiver.
Real-time Transport Protocol (RTP) allows manufacturers to insert precision time stamps and sequence numbers into packets at the transmitter. If we use these time stamps to indicate the precise time when the packets were launched, then at the receiver we can see trends across the network.
Is network delay increasing? What are the implications on buffer management at the receiver? This information allows receivers to adjust in order to produce the continuous stream at the output.
RTP sequence numbers are simply numbers that are inserted in the RTP header. The numbers increase sequentially. At a receiver, if you receive a packet stream in the order , , , , you know immediately that you need to reorder packets 3 and 4 in order to present the information to the MPEG-2 TS de-multiplexer in the order in which it was transmitted.
The next layer is User Datagram Protocol (UDP) encapsulation. MPEG-2 packets are 188 bytes. This data needs to be mapped into packets for transmission. The newly created SMPTE 2022-6 standard describes how to do this. UDP is designed to provide a simple scheme for building packets for network transmission. Transmission Control Protocol (TCP) is another alternative at this layer, but TCP is a much heavier implementation that, for a variety of reasons, is not well suited to professional live video transmission.
UDP packets are then encapsulated in IP datagrams, and at the IP layer, network source and destination addresses are then added. What this does is allow the network to route data from one location to another without the use of external routing control logic.
Finally, the IP datagrams are encapsulated in Ethernet packets. The Ethernet layer adds the specification of electrical and physical interfaces, in addition to Ethernet addressing that ties a specific physical device to an address, something IP addressing does not do.
Hopefully, this real-world example helps you to understand that layered systems are critical to the success of modern networked professional video, and that each layer adds something unique to the system.
By Brad Gilmer, Broadcast Engineering
Friday, May 03, 2013
Labels: JPEG 2000
Broadcasters, film studios and post-production houses are currently facing a major challenge in that the volume of generated video material is increasing dramatically. The result is a significant increase in the need for storage and archive capability.
Broadcasters and video archivists are also looking for long-term digital preservation. In most cases, the source material is not digital. Instead, it is on film that needs to be scanned or high-quality analog video tape.
A production and digital archive compression format, with no concessions in video content quality and the actual fabrication process, is the obvious choice — one that reduces storage costs compared to uncompressed video, while still maintaining indefinite protection from loss or damage. Such a format should preserve original quality, while also easily enabling the generation of most of the commonly used formats.
Several questions are frequent when selecting a format. What is the best physical long-term storage media for video content? What is a good candidate for a digital preservation? Can digital content be interpreted in the future? Various options are possible, and organizations have to decide carefully.
Today’s broadcasters understand the industry’s keywords: highest image quality, flexible delivery formats, interoperability and standardized profiles for optimal preservation. They also have a vested interest in a common high-end format to store, preserve and commercialize the avalanche of video footage generated globally. JPEG 2000 is the growing choice for master file format.
Digital Storage Keys
There are three keys to digital storage preservation:
- Ensure continuous access to content over time. Archive and storage covers all activities necessary to ensure continued access to digital materials for as long as necessary. This includes strategies to ensure access to reformatted and digitally-born content, regardless of the risks of media failure and technological changes. Quality preservation is crucial. A conversion system of archived items is important for dissemination or distribution.
- Everything that belongs together fits in one package. Archiving is an enduring process concerned with the impacts of changing technologies, whether it is the support of new media and data formats or a changing user community. “Long term” may extend indefinitely.
- Use open, well-documented industry standards — no proprietary formats. Ideally, focus on standards recognized and used for archiving applications. Open-file formats are published specifications, usually maintained by standards organizations, which can therefore be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free/open-source softwares, using both types of software licenses. Open formats are also called free-file formats if they are not burdened by any copyrights, patents, trademarks or other restrictions. Anyone may use it at no cost for any desired purpose.
To standardize digital preservation practices and provide a set of recommendations for preservation program, the Reference Model for an Open Archival Information System (OAIS) was developed. OAIS is concerned with all technical aspects of a digital object’s life cycle: ingest into and storage in a preservation infrastructure, data management, accessibility and distribution. Continued interoperability is strategic; one needs easy and fast format conversion, as well as playback compatibility between manufacturers. For instance, a master file format must not be linked to any specific application, production format or major user.
JPEG 2000 in OP1a MXF
JPEG 2000 is based on Discrete Wavelet Transformation (DWT), scalar quantization, context modeling, arithmetic coding and post-compression rate allocation. JPEG 2000 provides random access (i.e., involving minimal decoding) to the block level in each sub-band, thus making it possible to decode a region, a low resolution or a low-quality image version without decoding the whole picture.
Functionally, JPEG 2000 is a true improvement that provides lossy and lossless compression, progressive and parseable code streams, error resilience, region of interest, proxies, random access and other features in one integrated algorithm.
In the video domain, JPEG 2000 is conceived as an intra-frame codec, so it closely matches the production workflow in which each video frame is treated as a single unit. Its ability to compress frame-by-frame has made it popular in the digital intermediate space in Hollywood. If the purpose of compression is the distribution of essence, and no further editing is expected, long-GOP MPEG will typically be preferred.
JPEG 2000 brings a storehouse of features to the broadcast process, whether ingest, transcoding, captioning, quality control or audio-track management is requested. Its inherent properties fully qualify it for high-quality, intermediate creation and masters archives. JPEG 2000 supports every resolution, color depth, number of components and frame rates; in short, the codec is future-proof.
The intra-frame quality of JPEG 2000 prevents error propagation over multiple frames and allows video signal edition at any given time. Two wavelet filters are included: the irreversible 9/7 and the fully reversible 5/3. The 5/3 wavelet filter offers a pure mathematically lossless compression, allowing an average 60-percent reduction in storage, while still allowing the exact original image information to be recovered. The 9/7 wavelet filter still performs visually lossless encoding. JPEG 2000 offers uncompressed quality, with no concession in video content quality and an important reduction in bandwidth and storage consumption.
Additionally, its scalability features a “create once, use many times” approach for a wide range of platforms. Easy transcoding of the codec appeals to high-end applications where workflows vastly benefit from transcoding to an intermediate version. JPEG 2000 ensures a clean, quick operation when bit-rate is at a premium.
Correctly transcoded HD1080p JPEG 2000 files compressed at 100Mb/s have been labeled “visually identical” to the 2K original footage by professional viewers. Furthermore, the wavelet-based JPEG 2000 compression does not interfere with the final — usually DCT-based — broadcast formats.
Post-production workflows consist of several encoding/decoding cycles. JPEG 2000 preserves the highest quality throughout this process, without any blocking artifacts creation. Moreover, all common bit depths, whether it is 8-bit, 10-bit, 12-bit or 16-bit, are supported.
Uniquely matching current industry needs, standardized broadcast profiles were adopted in 2010 (JPEG 2000 Part 1 Amd 3 – Profiles for Broadcast Application - ISO/IEC 15444-1:2004/Amd3), ensuring this wavelet-based codec its benchmark position in contribution, while fulfilling the industry-wide request for compression standards to archive and create mezzanine formats. A variety of media distribution channels can be transcoded.
The ongoing standardization process of the Interoperable Master Format (IMF) by SMPTE, focused on JPEG 2000 profiles, brings the adoption full-closure. The SMPTE standards also specify, in detail, how JPEG 2000 video data should be encapsulated in the widely adopted MXF.
Finally, a non-technical feature makes the JPEG 2000 open standard even more attractive for long-term projects; it is license- and royalty-free.
Most other codecs are proprietary. Some have compliancy issues and several limitations to support any video formats or resolutions. The MPEG family is ideal for last-mile content delivery to viewers, but not for production and storage, since pictures have to be post-processed.
JPEG 2000 has gained significant attraction as a mezzanine format. Open and well-documented, the codec is future-proof and extendable. That said, it is not surprising that the Library of Congress, France’s Institut National de l’Audiovisuel and several Hollywood studios, such as 20th Century Fox, have selected the codec for storage and preservation.
JPEG 2000 is a codec like no others. It gives users a superior quality, control and a unique ﬂexibility of the image processing chain. The growing use of JPEG 2000 to archive and create mezzanine files, and the ongoing standardization process of the IMF based on JPEG 2000, are just a few of its advantages.
By Jean-Baptiste Lorent, Broadcast Engineering
Until recently, simulcast streaming to connected devices was performed using protocols like RTMP, RTSP and MMS. In 2009, when the iPhone 3GS was launched, iOS 3.0 included a new streaming protocol called HTTP Live Streaming (HLS), part of a new class of video delivery protocol.
HLS differed from its predecessors by relying only upon HTTP to carry video and flow control data to the device. It made the protocol far more firewall-friendly and easier to scale, as it required no specialist streaming server technology distributed throughout the Internet to deliver streams to end users. The regular HTTP caching proxies that serve as the backbone of all Content Delivery Networks (CDNs) would suffice.
Apple was not alone in making this paradigm switch. Microsoft and Adobe also introduced their own protocols — SmoothStreaming and HDS, respectively. Today, work is ongoing to standardize these approaches into a single unified protocol, under a framework known as MPEG-DASH.
What is significant about all these is that they separate the control aspects of the protocol from the video data. They share the general concept that video data is encoded into chunks and placed onto an origin server or a CDN. To start a streaming session, client devices load a manifest file from that server that tells them what chunks to load and in what order. The infrastructure that serves the manifest can be completely separate from the infrastructure that serves the chunks.
The separation of these concerns provides a basis for dynamic content replacement, as it is possible to dynamically manipulate the manifest file to point the client device at an alternative sequence of video chunks that have been pre-encoded and placed on the CDN. The ability to swap chunks out in this way relies on the encoding workflow generating video chunks whose boundaries match possible replacement events.
Stream Conditioning and ESAM
Multi-screen encoding workflows must deal with encoding the video, as well as packaging it for delivery into the protocols required by devices. Stream conditioning for dynamic content replacement is about ensuring that the encoding workflow knows when events at which replacement could occur, and ensuring that the video is processed correctly. It is important to emphasize that the replacement does not happen at this point: It is done closer to the end user.
When the encoder is informed about a splice point, it starts a new group of pictures (GOP), and when this GOP is encountered downstream by the packager, a new video chunk is created, as shown in Figure 1. Broadcasters should be wary of how their encoder and packager handle edge cases, such as when a splice point comes just before or after where a natural GOP and video chunk boundary would have been, so that extremely small video chunks and GOPs are avoided.
Splice points can be signaled to the encoding workflow in-band or out-of-band. More and more multi-screen encoders are capable of handling SCTE-35 messages within an input MPEG TS to determine splice points. Most multi-screen encoders that support SCTE-35 handling also have either a proprietary HTTP-based API or support SCTE-104 for out-of-band splice-point signalling.
There has been a clear need to standardize stream conditioning workflows to allow interoperability between systems deployed for that purpose.
The Event Signaling and Management (ESAM) API specification — a new specification that emerged from the CableLabs Alternate Content working group — describes the relationship and messages that can be passed between a Placement Opportunity Information Server (POIS) and an encoder or packager. The POIS is responsible for identifying and managing the splice points, using the ESAM API supported by the encoder and packager to direct their operations.
The specification defines how both encoder and packager should converse with the POIS, but not how the POIS operates or how it decides on when the splice points should appear. This is considered implementation-specific, but could, for example, use live data from the broadcast automation system to instruct the encoder and packager. ESAM also permits hybrid workflows where the splice points are signaled in-band with SCTE-35 and then decorated with additional properties (including their duration) by the POIS server from out-of-band data sources.
The ESAM specification is relatively speaking brand new, but it is gathering support from encoder/packager vendors. Broadcasters building multi-screen encoding workflows today, even if dynamic content replacement is required initially, should ensure that an upgrade path to ESAM is available in their chosen vendor’s roadmap to ensure future-proofing.
User-centric Ad Insertion at the Edge
On the output side of the encoding workflow, video chunks are placed onto the CDN, and a cloud-based service responsible for performing dynamic content replacement receives the manifest file. This means that the actual replacement of content — whether it is ad insertion or content occlusion for rights purposes — is performed, in network topology terms, close to the client device.
The mechanism, as described earlier, relies on the relatively lightweight manipulation of the part of the video delivery protocol that tells the client where to fetch the video chunks. This can be performed efficiently on such a scale as to permit decisions to be made for each individual user accessing the live stream.
By performing the content replacement in the network, the client simply follows the video segments laid out to it by the content replacement service, and the transitions between live and replacement content are completely seamless. That makes it a broadcast experience — a seamless succession of content, advertising, promos and so on, with no freezes, blacks or buffering — but with the potential for user-centric addressability.
Of course, the content replacement policy and user tracking through the integration of this content replacement service lies with the broadcaster’s choice of ad servers and rights management servers. SCTE-130 defines a series of interfaces which include the necessary interface between a content replacement service and an Ad Decision Service (ADS). In the Web world, Video Ad Serving Template (VAST) and Video Player Ad Interface Definition (VPAID) have emerged as generally analogous specifications.
The ability to tailor content down to the individual, by replacing material in stream, in simulcast, while retaining the broadcast-quality experience of seamless content, is a totally new concept. The commercial ecosystem that generates the need for focused ad targeting must now catch up with the technology that supports it.
David Springall, Broadcast Engineering
Wednesday, April 24, 2013
Labels: Adaptive Streaming
Unfortunately, the term “media services” contains two overloaded terms — “media” and “services.” In this case, when we talk about media services, we are talking about small (some would say atomic), network-based applications “services” that perform simple, focused tasks.
These tasks are somehow related to either essence or metadata used by professional broadcasters, post-production facilities and the film industry.
An example of a “media service” might be a service sitting out on a network that is available to transcode content from one popular video format to another. One can imagine a host of services, including tape ingest, QC and file movement. Each of these services is available out on the network, and can be used to perform a discrete unit of work. Services perform higher functions by grouping a number of atomic services together in a logical way. But, at their core, media services are small, discrete pieces of software that can be combined in different ways to perform work.
This is a significant departure from traditional media infrastructures, where an ingest station consists of tape machines, routers, monitors and other hardware — all hard-wired together to perform a specific function. In fact, entire broadcast chains are built this way. They are highly optimized and efficient, but they can be very difficult to change. And, if one thing is certain these days, it is that change is a permanent part of our business.
A media services architecture allows discrete blocks of functionality to be combined to build complex workflows. As workflows change, blocks can be recombined into modified workflows. If new functionality is added, new services can be deployed. Additionally, discrete services may be used in multiple workflows. So, a transcoder may be deployed in a post-production scenario for one job and then redeployed in a conversion for web applications next.
When using media services, it is not enough that the services are available out on the network; something must consume those services in order to perform valuable work for the organization. There are several approaches to using services, but, for this article, we are going to focus on two of them — orchestration and event-driven architecture.
Orchestration systems sit on top of media services and use media services to move work through a defined pipeline from start to finish. For example, an orchestration system might have a workflow that ingests a tape, transcodes the content and then saves the file on a large central server.
The orchestration system tracks the progress of the workflow, calling on various services to work on the job as it moves through the pipeline. The orchestration system is responsible for not only dealing with normal flows, but it is also responsible for dealing with error conditions such as a failed transcode. Orchestration can start out simple, but it can become complicated as engineers consider all of the various states and error conditions possible in the workflow.
Event-driven architecture is another way to use services to perform work. At a high level, in this architecture, something causes something else to happen (an event) that is of significance to the business. Processing engines can be set up to listen for that event, and when the event happens, they can perform actions based on the event. Other processing engines can be listening down-stream, and when one event engine finishes, others can be triggered. In an event-driven architecture, there is no central system guiding the flow of work through a pipeline. The movement of work through the facility is caused by a sequence of events and processes.
An operator finishing an ingest activity might create an “Ingest Complete” event. Event processing engines subscribe to event channels, such as the “Ingest Complete” event channel. This particular event-processing engine might have two actions: the first is to notify the QC operator that the file is available for quality control checking; and the second is to publish an “Ingest Complete” notification for other systems that might be interested in the event, such as traffic and automation systems. Both of these systems might update the status of the media based on the “Ingest Complete” event.
Note that, in this example, it would be extremely easy to add another event process engine to the event channel. This engine might be responsible for creating a number of different formats from the original ingested file format. Adding this process does not require modifying a workflow in a central system. All one has to do is to subscribe the transcoding engine to that particular event channel.
SOA and EDA
It is important to realize that orchestration and event-driven architecture are complementary, and they are frequently deployed together. For example, in the earlier referenced figure, the tape ingest function might be driven by an orchestration system that precisely controls workflow and error-handling conditions. Event-driven architecture would be used to notify other SOA processes once the ingest is complete.
Common service interface definitions are critical. One can imagine a whole universe of services: a content repository service, a media identification service, a publish content to ISP service, and so on. And, one can imagine that several different vendors would make such services available. If each vendor defined the interface to their service independently, the amount of software integration required to build these systems would be huge. On the other hand, if the industry would agree on the service interface definition for an ingest service, for example, then it would be possible to integrate various ingest services into a workflow with minimal additional development.
Common service interface definitions are critical, but it is also critical that we have a common overall framework within which services can be deployed. How do services communicate with orchestration systems and with each other in event-driven architectures? How do newly commissioned services make their presence known on a network? Again, having a harmonized approach to an overall media service architecture will lower costs and shorten implementation time.
One last critical element in the discussion of media services is governance. Governance brings logic and structure to media services. Areas typically in governance include: service life cycle (how services are developed, deployed, deprecated and eventually decommissioned); helping to prioritize the deployment of new services; and how the quality of deployed services can be ensured.
There is a task force in the industry called the Framework for Interoperable Media Services (FIMS). FIMS is a collaboration between the Advanced Media Workflow Association and the European Broadcasting Union. FIMS is the first industry effort focused on developing services for the professional media industry. The FIMS group consists of a Business Board that develops business priorities for service development, and a Technical Board that oversees the development and deployment of FIMS services.
You can learn more about FIMS at http://www.fims.tv. For technical information, you can visit the FIMS wiki at http://wiki.amwa.tv/ebu. This activity has already yielded an overall framework for media services and several specific media service definitions. Work is ongoing, all the work is public, and anyone can participate.
By Brad Gilmer, Broadcast Engineering
DashCast is an open source application which allows users to:
- Transcode a live/non-live stream in multiple qualities (eg. bitrate and resolution).
- Segment a continuous stream in small chunks and packetize them for delivery via Dynamic Adaptive Streaming over HTTP (DASH) standard.
Tuesday, April 23, 2013
Labels: MPEG DASH
According to multiple studies, HEVC should deliver up to 50% better compression than H.264 in video on demand (VOD) applications, which means similar quality at half the bitrate. In live encoding, which obviously has to be performed in real time, the initial expectations are a 35% bandwidth reduction at similar quality levels. Alternatively, HEVC can also enable larger resolution movies, whether 2K or 4K.
Essentially, these are the two benefits of HEVC in the streaming space. The first relates to encoding existing SD and HD content with HEVC rather than H.264, enabling cost savings and/or the ability to stream higher quality video to lower bitrate connections. The second relates to opening up new markets for Ultra-High-Definition (UHD) videos.
On the playback side, there are multiple data points, but no real clear picture. Several companies have announced software decoders, but it’s unclear how much horsepower is necessary to drive them. The original targets for HEVC were 10x encoding complexity and 2x–3x decoding complexity as compared to H.264, and most sources have confirmed the 10x figure for encoding. Perusing various comments from various sources, decoding complexity has ranged from “same as H.264” to the 2x–3x figure.
Remember, however, that H.264 playback is accelerated in hardware on most playback platforms, including GPU-accelerated playback on computers. According to my source at graphics vendor NVIDIA, “There currently isn’t any dedicated [hardware] support for HEVC in our current GPUs. I’m pretty limited to what I can say about future products. But I’ll just say that our goal is for our GPUs to support all current video standards. Now, that said, it is quite possible for third parties to write HEVC encoders and decoders using CUDA to use the processing capability of current GPUs.”
Though GPU acceleration may be coming, it isn’t here yet, so H.264 and HEVC aren’t on a level playing field when it comes to accessible playback hardware. Still, according to a source at Qualcomm, Inc., “We are able to get 1080p, 30fps HEVC Main profile video with just a little bit over 50% CPU utilization on a quad-core architecture.”
On the mobile side, my source reported, “At CES we showcased Ittiam’s ARM based decoder and played back 1080p HEVC and 1080p H.264 videos side by side on recently announced Snapdragon 800 platform. Ittiam’s decoder is still in development and will get further enhancement.”
According to a report titled “HEVC Decoding in Consumer Devices,” senior analyst Michelle Abraham from Multimedia Research Group, Inc. estimated that the number of consumer devices that shipped in 2011 and 2012 that would be capable of HEVC playback with a software upgrade totaled around 1.4 billion, with more than a billion more expected to be sold in 2013. According to Abraham, in compiling these statistics she assumed that all PCs shipped in each year would be HEVC-capable.
I spoke with several encoding companies; many were bullish on HEVC and have either made HEVC-related product announcements (Elemental Technologies, Inc.) or will at NAB. Another made the very cogent comment that the encoding side was always ahead of the game and that the path to actual producer adoption is widespread playback availability.
Speaking of playback, none of the major players -- Adobe, Apple, Google, or Microsoft -- have announced HEVC playback support in their respective players, browsers, or mobile or desktop operating systems. One reason why -- and a potential monkey wrench in at least the short-term HEVC adoption cycle -- is that no one knows what it will cost to use HEVC.
Royalty Issues with HEVC
What’s clear at this point is that multiple companies have patents relating to HEVC technology, and they plan to ask for royalties from those who use their technology. This was the case with H.264 as well, and though many in the streaming industry grumbled about the royalties, this disgruntlement certainly didn’t limit H.264’s success.
Two things are different with HEVC. First, where H.264 involved a single group of patent holders administered by MPEG LA, it appears that some HEVC patent holders want to pursue royalties outside of a patent group, which will make it more challenging for HEVC users to license the technologies. According to “Patent Snafus Could Delay New Video Codec,” Mediatek and Qualcomm do not want to join the HEVC group formulated by MPEG LA, and Samsung hasn’t decided either way.
One chipmaker executive, speaking anonymously for the EE Times article, commented, “HEVC has so many patent holders and some of them say they will not be part of the pool but want to collect royalties themselves. If say 20 people all want to collect royalties it will kill the standard -- we need a fixed cost, it cannot be variable,” he added.
Beyond this uncertainty, HEVC is coming to the streaming media market much faster than H.264, where royalty policies were in place well before any significant market adaption. To recount, the H.264 spec was approved in March 2003, and MPEG LA announced licensing terms in November 2003. Obviously, when Apple announced support for H.264 in QuickTime 7 in April 2005, royalty policies were firmly in place. Ditto for when Adobe announced that it would include H.264 in Flash in March 2008, and when Microsoft added H.264 to Silverlight in July 2009.
Our contact at MPEG LA reported that while the HEVC group had met three times as of February 2013, there was still no guarantee that a group would be formed or that all patent holders would join the group. So it appears that HEVC early adopters will have to decide to implement the technology without knowing the cost.
For large companies such as Adobe, Apple, Google, and Microsoft, that might be tenable; the H.264 license was capped, and it’s reasonable to assume that the HEVC license will also be capped. All four companies can amortize that cost over millions of product units shipped, and I think it’s highly likely that one or more of these companies will announce HEVC integration by NAB.
Even the encoding companies that I spoke with commented that they might incorporate HEVC technologies into their encoding tools without knowing the cost, because, as one exec said, “Supporting new formats is the race that we run.” The exec also noted, however, that this was the first time that they were ever forced to consider embracing a codec without having an idea about the licensing structure.
However, let’s get back to the two potential benefits that actual publishers seek from HEVC: cost savings and opening up new products and services. In both cases, it seems unlikely that any producer would use HEVC-encoded video without a known cost structure. Sure, H.264 usage for free internet video is free, but that decision was made under a completely different set of circumstances, and it’s doubtful if HEVC usage will be similarly unencumbered.
How H.264 Became Free
A short history lesson will explain why H.264 became free. When the terms of the initial H.264 license were announced, there was a royalty on H.264-encoded video deployed in a pay-per-view or subscription operation. The royalty was not on free internet video, at least through the initial term of the license, which ended Dec. 31, 2010.
The licensing terms attributed this waiver to the fact that the internet streaming market was “still developing,” though this is likely disingenuous. The fact of the matter was that the H.264 implementations of that time offered only a slight quality improvement over VP6, the predominant Flash codec, and required more CPU horsepower for playback. There was also no mobile platform such as iOS or Android that wasn’t compatible with VP6 that could force producers to use H.264. So 99% of producers were satisfied with VP6 and wouldn’t have experimented with H.264 if there was a royalty involved.
In February 2010, MPEG LA extended the royalty moratorium for free internet video through December 2016. In an interview with Streaming Media, MPEG LA president and CEO Larry Horn attributed this decision to the fact that “though some companies are doing well with advertising supported video, overall the models are still in flux, and the patent group didn’t want to plug a royalty into a business model that’s still unsettled.”
In May 2010, Google announced WebM, an open source alternative to H.264 that offered very similar quality and playback performance. In August 2010, MPEG LA announced that there would be no royalties on free internet usage of H.264 in perpetuity. Though MPEG LA never publicly admitted that the availability of a free, open source solution contributed to this decision, the timing would suggest it did.
Why HEVC Probably Won’t Be Free
Fast-forward to 2013. HEVC is ready, and at this point, it has no real competition. Sure, you can point to VP9, but considering how poorly Google executed taking VP8/WebM to market, it’s unlikely that any producers -- or patent groups -- will take it seriously. In 2012, online video advertising jumped to $2.9 billion, so it’s tough to say that this market is still in flux.
For these reasons, it seems unlikely that if HEVC enables large publishers to cut their bandwidth costs by 50%, HEVC patent holders wouldn’t want their share. It seems equally unlikely that publishers seeking to reduce bandwidth costs via HEVC would start using the technology until the cost structure was known. So HEVC implementation seeking to harvest this benefit is likely on hold pending the announcement.
HEVC and UHD Video
Again, I’m focusing my analysis on the streaming and OTT markets, since those are the ones I know best. Even without considering the royalty cost uncertainties, it seems unlikely that HEVC will spawn many new UHD-related products and services over the next 2 or 3 years for three reasons: bandwidth, the lack of 4K displays, and the lack of content.
Let’s start with bandwidth. In his blog post “H.265/HEVC Ratification and 4K Video Streaming,” compressionist Alex Zambelli, previously with Microsoft and now with iStreamPlanet, estimated that if HEVC produced a 40% bandwidth saving over H.264, a 4K movie would require bitrates of between 12Mbps and 15Mbps. Assuming that HEVC actually produced the 50% target, these numbers would drop to 10Mbps and 12Mbps.
According to the latest Akamai Technologies “State of the Internet” report, for 3Q 2012, the average connection speed in the U.S. was about 7.2Mbps, up from about 6Mbps the previous year. While users connecting on some premium services could handle HEVC’s 10Mbps–12Mbps in the short term, it’s unclear when a significant portion of the U.S. population will be able to support 4K HEVC movies.
Though the U.S. is far from the performance leader, only South Korea, with an average connection speed of 14.7Mbps, could sustain a 4K movie today, with Japan next at 10.5Mbps. Again, while dedicated satellite or cable networks could certainly carry this load, it’s unlikely that the type of shared internet connection used for OTT would be able to in the short term.
‘Why 4K TVs Are Stupid’
The second issue relates to the installed base of 4K-capable sets, which obviously will be necessary to view 4K movies. According to analyst DisplaySearch, global shipments of 4K sets will be well under a million in 2013, and just over 2 million in 2014.
Obviously, these are not inspiring numbers, and as you might have gleaned from the section title, there are those who feel they may be overly optimistic. In a CNET article titled “Why 4K TVs Are Stupid,” Geoffrey Morrison ran the math about how much detail your eyes can resolve, and the minimum size of pixels on a screen, and concluded that at a viewing distance of 10', the difference between 1080p and 4K wouldn’t be noticeable on TVs less than 77" in diagonal. It’s a pretty familiar argument, very similar to the 720p versus 1080p debate, where the rule of thumb was that you couldn’t tell the difference on TVs 50" or smaller when watching from farther than 8' away.
In other words, while 4K TVs might make sense for a home entertainment center or man cave, they probably don’t for the typical living room, where mass markets are made. At some point, economies of scale will take over, and 4K sets will be cheaper than 1080p. Until that happens, however, it’s hard to predict that 4K sets will fly off the shelves, particularly because of the third item that will hinder 4K sets, and services targeting them -- a paucity of content.
No Content for 4K
By the time 1080p TV sets became affordable for the masses, there was plenty of HD content to watch. While ultimately we may say the same thing about 4K TVs, that time is clearly not now, both regarding affordability and content.
A good example of both is the 84" Sony XBR-84X900, which retails for $24,999.99 and includes 10 4K movies from Sony’s movie library and a variety of short-form 4K content. To supplement these movies, Sony plans on launching a 4K movie download service in the summer of 2013, potentially delivered on a new Blu-ray Disc standard for 4K movies.
Many movies today are being shot in 4K, and many film-based movies can be rescanned for 4K delivery. Beyond this, however, there’s very little other 4K content available, and minimal 4K production in sports and general-purpose television. So even those viewers who purchase a sufficiently large 4K monitor so that they can see the advantage over 1080p will have little 4K content to watch.
Of course, many 4K TVs can upsample SD and HD content to 4K, with mixed results. One U.K. reviewer found the results “astonishingly good,” while another, more expressive reviewer from Gizmodo commented, “As a result, anyone who spends £20,000 on a 4K TV at the moment will be doomed to watch upscaled HD content. That sucks. You don’t spend quite-nice-car cash on a TV, just so you can watch upscaled content.”
Note that some of the less expensive 4K TVs, such as Westinghouse Electric Corp.’s $2,499 50" set, don’t come with onboard Smart TV functionality or 4K upscaling technology. In these cases, you’d be dependent upon the upscaling provided by the set-top box, optical disc player, or game console.
What does all of this mean for the UHD OTT market? Overall, even though it seems likely that inexpensive HEVC-capable set-top boxes will be available by the end of 2013, there will be minimal 4K content to watch, few 4K TVs to watch it on, and insufficient bandwidth to deliver it.
Download and Watch
One market that seems potentially ripe for larger-than-1080p viewing is the traditional download to view, particularly given the multiple viewing options. For example, TimeScapes is a film by Tom Lowe featuring slow-motion and time-lapse cinematography of the landscape, people, and wildlife of the American Southwest. Shot in 4K, the movie is available in 10 versions, ranging from SD DVD and 1080p Blu-ray to 4K, including custom versions for 30" 2560x1440 displays and the MacBook Retina display (2880x1620). It’s an innovative strategy that could portend the optimal strategy for UHD movie distribution.
I asked Lowe about how his sales were distributed among the available offerings. He responded, “Actually the ‘30 inch’ 2560x1440 version is selling like hotcakes. Sales have exceeded my expectations, in terms of percentage sold vs Blu-ray, 1080p download, etc. I would say for every one in ten 1080p HD downloads we sell, we sell about four 2560 copies. So many people have the Dell and Apple 2560 displays, but have never, ever had any video to play at that resolution.”
Interestingly, watching video on tablets and computers seems like the optimal use of 4K screens, as viewers are actually close enough to the device to see the difference. Though it will likely never see the light of day, Panasonic showed an 18.7 x 13.1 tablet computer at CES with a 4K screen -- could true 4K tablets be far behind? And once they’re available, wouldn’t 4K movie downloads from iTunes seem like the natural next step? Once HEVC playback becomes available, it would cut download times and storage space by 50%.
How likely is true 4K viewing on computers? I asked Lowe for his thoughts about computers and tablets as a potential viewing platform. He replied, “I have always believed that 4K monitors are where 4K will really catch on. On a projector, or large TV, you don’t notice it as much. On a monitor only a couple feet away, the difference between 1080p and 4K is very striking. I think 4K monitors will catch on fast once they come out, among gamers, Photoshop enthusiasts, and people watching or making high-res video.”
What’s this add up to? For producers seeking to distribute SD and HD content encoded with HEVC, the lack of a known royalty structure is a major buzzkill. At this point, no one knows whether a single royalty structure will be in place or whether multiple IP properties will pursue royalties independently, or the timing of either of these efforts.
HEVC encoding should be generally available by the end of 2013, if not sooner, and the player-related picture should also be clearer. Though it’s impossible to predict what Apple will do, I would be surprised if there wasn’t an HEVC decoder for the iPad 3 and MacBook Retina line announced before the end of 2013. Ditto for Adobe announcing HEVC playback in Flash.
Those attempting to leverage HEVC to create new opportunities for the distribution of UHD video have the royalty mountain to climb, as well as a paucity of content and viewing platforms and the lack of bandwidth to deliver the streams. Given the data rates involved, it feels like downloading for viewing is a better short-term model, providing custom resolutions for specific resolution displays.
The increased efficiency that HEVC provides will ultimately make streaming video more affordable and deliverable. However, until the royalty picture clears, it’s hard to get excited about any projected cost savings from HEVC. On the UHD side, HEVC is only one piece of the puzzle that must come together to make 4K viewing a reality, and not a very important piece of the puzzle at that.
By Jan Ozer, StreamingMedia
When Samsung unveiled its next-generation smartphone, the Galaxy S4, in March this year, most of the Korean giant’s fans focused their attention on the device’s big 5-inch, 1920 x 1080 screen, its quad-core processor and its 13Mp camera. All impressive of course, but incremental steps in the ongoing evolution of the smartphone. More cutting edge is the S4’s promised support for a technology called HEVC.
HEVC is short for High Efficiency Video Coding. It’s the successor to the technology used to encode video stored on Blu-ray Discs and streamed in high-definition digital TV transmissions the world over. The current standard is called H.264 - aka MPEG 4, aka Advanced Video Coding (AVC) - so it’s no surprise that HEVC will become H.265 when the Is and Ts are dotted and crossed on the final, ratified version of the standard later this year.
This final standardisation is just a formality. The International Telecommunication Union (ITU-T), the body which oversees the "H" series of standards, and its partner in video matters, the ISO/IEC Moving Picture Experts Group (MPEG), have both given HEVC sufficient approval. This means device manufacturers such as Samsung, chipmakers such as Broadcom, content providers such as Orange France and mobile phone network operators such as NTT DoCoMo can begin announcing HEVC-related products safe in the knowledge that the standard will be completed with few, if any further changes.
It has taken H.265 three years to reach this stage, though exploratory work on post-H.264 standards goes back to 2004. The drive to develop the standard - a process overseen by a committee called the Joint Collaborative Team on Video Coding (JVT-VC) and comprising members of both MPEG and ITU-T - was outlined in January 2010 in a call for specification proposals from technology firms and other stakeholders.
Their brief is easy to summarise: H.265 has to deliver a picture of the same perceived visual quality as H.264 but using only half the transmitted volume of data and therefore half the bandwidth. H.264 can happily churn out 1920 x 1080 imagery at 30 frames per second in progressive - ie, frame after frame - mode, but it’s expected to start running out of puff when it comes to the 3840 x 2160 - aka 4K x 2K - let alone the 7680 x 4320 (8K x 4K) resolutions defined as Ultra HD pictures. H.265, then, was conceived as the technology that will make these resolutions achievable with mainstream consumer electronics kit like phones and televisions.
High Resolution, Low Bandwidth
Of course, 4K x 2K and up are, for now, thought of as big-screen TV resolutions. But it wasn’t so very long ago that 1920 x 1080 was considered an only-for-tellies resolution too. Now, though, plenty of phones, of which the Galaxy S4 is merely the latest, have screens with those pixel dimensions. Some tablets have higher resolutions still.
And while today’s mobile graphics cores have no trouble wrangling 2,073,600 pixels 30 times a second, that’s still a heck of a lot of data for mobile networks to carry to them, even over the fastest 4G LTE links. And so, in addition to supporting Ultra HD resolutions on large TVs, H.265 was conceived as a way to deliver larger moving pictures to phones while consuming less bandwidth requirements than H.264 takes up. Or to deliver higher, smoother frame rates over the same width of pipe.
This explains NTT DoCoMo’s interest in the new video technology. Its 2010 proposal to the JVT-VC was one of the five shortlisted from the original 27 suggestions in April 2010. All five could deliver a picture to match a good H.264 stream, but only four, including NTT DoCoMo’s, were also able to deliver a compression ratio as low as a third of what H.264 can manage. The JVT-VC’s target was 50 per cent more efficient compression for the same image size and picture quality.
The remaining proposals were combined and enshrined the the JVT-VC’s first working draft, which was published the following October. The committee and its partners have been refining and testing that specification ever since. Since June 2012, MPEG LA, the company that licences MPEG video technologies, has been bringing together patent holders with intellectual property that touches on the H.265 spec, before licensing that IP to anyone making H.265 encoders and decoders, whether in hardware or software.
So how does the new video standard work its magic?
Like H.264 before it - and all video standards from H.261 on, for that matter - HEVC is based on the same notion of spotting motion-induced differences between frames, and finding near-identical areas within a single frame. These similarities are subtracted from subsequent frames and whatever is left in each partial frame is mathematically transformed to reduce the amount of data needed to store each frame.
Singing, Ringing Tree
When an H.264 frame is encoded, it’s divided into a grid of squares, known as "macroblocks" in the jargon. H.264 macroblocks were no bigger than 16 x 16 pixels, but that’s arguably too small a size for HD imagery and certainly for Ultra HD pictures, so H.265 allows block sizes to be set at up to 64 x 64 pixels, the better to detect finer differences between two given blocks.
In fact, the process is a little more complex than that suggests. While H.264 works at the block level, with a 16 x 16 block containing each pixel’s brightness - "luma" in the jargon - and two 8 x 8 blocks of colour - "chroma" data - H.265 uses a structure called a "Coding Tree". Encoder-selected 16 x 16, 32 x 32 or 64 x 64 blocks contain pixel brightness information. These luma blocks can then be partitioned into any number of smaller sub-blocks containing the colour - "chroma" - data. More light-level data is encoded than colour data because the human eye is better able to detect differences the brightness of adjacent pixels than it is colour differences.
HEVC samples pixels as "YCrCb" data: a brightness value (Y) followed by a number that shows how far the colour of the pixel deviates from grey toward, respectively, red and blue. It uses 4:2:0 sampling - each colour sample has one-fourth the number of samples than brightness and specifically half the number of samples in each axis. Samples are 8-bit or 10-bit values, depending on the HEVC Profile the encoder is following. You can think of a 1080p picture containing 1920 x 1080 pixels worth of brightness information but only 960 x 540 pixels worth of colour information. Don’t worry about the "loss" of colour resolution - you literally can’t see it.
A given "Coding Tree Unit" is the root of a tree structure that can comprise a number of smaller, rectangular "Coding Blocks", which can, in turn, be dividing into smaller still "Transform Blocks", as the encoder sees fit.
The block structure may differ between H.264 and H.265, but the motion detection principle is broadly the same. The first frame’s blocks are analysed to find those that are very similar to each other - areas of equally blue sky, say. Subsequent frames in a sequence are not only checked for these intrapicture similarities but also to determine how blocks move between frames in a sequence. So a block of pixels that contain the same colour and brightness values through a sequence of frames, but changes location from frame to frame, only needs to be stored once. An accompanying motion vector tells the decoder where to place those identical pixels in each successive recovered frame.
Made for Parallel Processing
H.264 encoders check intrapicture similarities in eight possible directions from the source block. H.265 extends this to 33 possible vectors - the better to reveal more subtle pixel block movement. There are also other tricks that H.265 gains to predict more accurately where to duplicate a given block to generate the final frame.
As HEVC boffins Gary Sullivan, Jens-Rainer Ohm, Woo-Jin Han and Thomas Wiegand put it in a paper published in the IEEE journal Transactions on Circuits and Systems for Video Technology in December 2012: “The residual signal of the intra- or interpicture prediction, which is the difference between the original block and its prediction, is transformed by a linear spatial transform. The transform coefficients are then scaled, quantised, entropy coded and transmitted together with the prediction information.”
When the frame is decoded, extra filters are applied to smooth out artefacts generated by the blocking and quantisation processes. Incidentally, we can talk about frames here, rather than interlaced fields, because H.265 doesn’t support interlacing. So there will be no 1080i vs 1080p debate with HEVC. “No explicit coding features are present in the HEVC design to support the use of interlaced scanning,” say the minds behind the standard. The reason? “Interlaced scanning is no longer used for displays and is becoming substantially less common for distribution.” At last we’ll be able to leave our CRT heritage behind.
In addition to all this video coding jiggery pokery, H.265 includes the kind of high-level information that H.264 data can contain to help the decoder cope with the different methods by which a stream of video data can move from source file to screen - along a cable or across a wireless network, say - which will result in degrees of data packet loss. But HEVC gains extra methods for segmenting an image or the streamed data to take better advantage of parallel processing architectures and synchronise the output of however many image processors are present.
H.264 has the notion of "slices" - sections of the data that can be decoded independently of other sections, either whole frames or parts of frames. H.265 adds "tiles", which are an even number of 256 x 64 slices into which a picture can be optionally segmented so that each contains the same number of HEVC’s core Coding Tree Units so there’s no need to synchronise output. This is because each graphics core processes any given CTU in the same amount of time. Other tiles sizes may be allowed in future versions of the standard.
An alternative option available to the encoder is Wavefront Parallel Processing, which cuts each slice into rows of CTUs. Work can begin on processing any given row once just two CTUs have been decoded from the preceding row (once decoding clues have been recovered from them). The exception, of course, is the very first row in the sequence. This approach, say HEVC boffins, makes for fewer artefacts than tiling and may yield a higher compression ratio, but its processing power requirements are greater than tiling. For now, the choice is tiles, WPP or neither, though the use of both may be permitted in future HEVC revisions.
H.265 makes jumping around within the video a smoother process by allowing complete frames - pictures that can be decoded without information from any preceding frame - to be more logically marked as such for the decoder. Don’t forget, the decoding order of information within a stream isn’t necessarily the same as the order in which the frames are displayed. Jump into a stream part-way and some upcoming complete frames that are needed for decode pre-entry point frames but not for those that will now be displayed can - if correctly labelled - be safely ignored by the decoder.
Since a video stream of necessity must incorporate at least one complete frame, it’s no surprise that H.265 has a still-image profile, Main Still Picture. Like the basic, Main profile, MSP is capped at eight bits per sample. If you want more, you’ll be needing the Main 10 profile, which, as its name suggests, used 10-bit samples. So far 13 HEVC levels have been defined, essentially based on the picture size and running from 176 x 144 to 7680 x 4320. The levels yield maximum bitrates of 128Kbps to 240Mbps in the mainstream applications tier, and 30-800Mbps in the high performance tier, which might be used for broadcast quality pre-production work and storage, for instance.
This, then, is the version of HEVC enshrined on the first final drafts of H.265. Work is already underway on extensions to the specification to equip the standard with the technology needed for 3D and multi-view coding, to support greater colour depths (12-bit initially) and better colour component sampling options such as 4:2:2 and 4:4:4, and to enable scalable coding, which allows a high quality stream to deliver a lower quality sub-stream by dropping packets.
The H.265 extensions are all some way off. First we all need H.265 to be supported by our preferred operating systems, applications and - in the case of low-power mobile devices - in the chips they use. With H.264 now the de facto standard for video downloads and streaming, there’s a motivation to move up to the next version, especially if it means buyers can download comparable quality videos in half the time. It seems certain to be a part of "Blu-ray 2.0" whenever that emerges, and probably future ATSC TV standards, but the ability to get high quality video down off the internet is surely a stronger unique selling point than delivering 4K or 8K video on optical disc to an ever-declining number of consumers.
Not Yet Ready for Primetime
Videophiles will demand 4K and perhaps 8K screens, but for most of us there’s little visible benefit in moving beyond 1080p - unless we also move our sofas very much closer to our TVs. Even if broadcasters migrate quickly to 4K, perhaps in time to show 2014’s World Cup tournament in the format, they can do so with H.264. But if you’re going to have to get a new, bigger screen, you may as well get one with a new, better codec on board. Still, H.265 in the living room looks set to be of limited interest for some years yet.
4K streaming may be equally far off, but more efficient 1080p streaming is something many folks would like to have now. Given the competition among IPTV services, it’s not hard to imaging many providers hopping upon H.265 to improve their offerings, once the standard becomes supported by the web browsers and apps they use. Greater compression of SD and HD video content than H.264 can provide is what will drive adoption of H.265 in the near-term.
And of course, it’s going to require greater processing power than previous codecs needed - 10 times as much for encoding, some specialists reckon, with two-to-three times as much for decoding. Producing H.265 video requires the encoder to evaluate many more encoding options at many more decision points that is the case with H.264, and that takes time. Likewise regenerating complete frames from the many more hints that the encoder provides. Encoding and decoding HEVC doesn’t necessarily favour chips with higher clock speeds, though. H.265’s emphasis on parallelisation favours CPUs and GPUs equipped with lots of cores that can crunch numbers simultaneously.
“HEVC/H.265 is an ideal fit for GPU encoding,” say staffers at Elemental Technologies, a maker of video encoding hardware. “HEVC/H.265 encoding presents computational complexities up to ten times that of H.264 encoding, for which the massively parallel GPU architecture is well-suited. With 500 different ways to encode each macroblock, the demanding processing requirements of H.265 will strain many existing hardware platforms.”
Easy to say, harder to deal with. We won’t know how well encoders and decoders work - be they implemented in software or hardware - until they actually arrive. Software support will come first. Market-watcher Multimedia Research Group reckons there are around 1.4 billion gadgets already on the market that, given a suitable software upgrade, will be able to play H.265 video. A billion more are coming out this year. But being able to decode H.265 is one thing - being able to do so smoothly and with an efficient use of energy is something else.
This kind of uncertainty will hold H.265 back in the short term, at least as far as its mainstream adoption goes. That’s no great surprise, perhaps. Even today, the majority of Britain’s broadcast digital TV channels used MPEG-2 - aka H.262 - with H.264 used only for HD content. H.265-specific chippery is expected this year, but not in shipping product until Q4 at the earliest - Q1 2014 is a more practical estimate.
By Tony Smith, The Register