Content Replacement: New Protocols Enable Flexibility in Simulcast Services

Until recently, simulcast streaming to connected devices was performed using protocols like RTMP, RTSP and MMS. In 2009, when the iPhone 3GS was launched, iOS 3.0 included a new streaming protocol called HTTP Live Streaming (HLS), part of a new class of video delivery protocol.

HLS differed from its predecessors by relying only upon HTTP to carry video and flow control data to the device. It made the protocol far more firewall-friendly and easier to scale, as it required no specialist streaming server technology distributed throughout the Internet to deliver streams to end users. The regular HTTP caching proxies that serve as the backbone of all Content Delivery Networks (CDNs) would suffice.

Apple was not alone in making this paradigm switch. Microsoft and Adobe also introduced their own protocols — SmoothStreaming and HDS, respectively. Today, work is ongoing to standardize these approaches into a single unified protocol, under a framework known as MPEG-DASH.

What is significant about all these is that they separate the control aspects of the protocol from the video data. They share the general concept that video data is encoded into chunks and placed onto an origin server or a CDN. To start a streaming session, client devices load a manifest file from that server that tells them what chunks to load and in what order. The infrastructure that serves the manifest can be completely separate from the infrastructure that serves the chunks.

The separation of these concerns provides a basis for dynamic content replacement, as it is possible to dynamically manipulate the manifest file to point the client device at an alternative sequence of video chunks that have been pre-encoded and placed on the CDN. The ability to swap chunks out in this way relies on the encoding workflow generating video chunks whose boundaries match possible replacement events.

Stream Conditioning and ESAM
Multi-screen encoding workflows must deal with encoding the video, as well as packaging it for delivery into the protocols required by devices. Stream conditioning for dynamic content replacement is about ensuring that the encoding workflow knows when events at which replacement could occur, and ensuring that the video is processed correctly. It is important to emphasize that the replacement does not happen at this point: It is done closer to the end user.

When the encoder is informed about a splice point, it starts a new group of pictures (GOP), and when this GOP is encountered downstream by the packager, a new video chunk is created, as shown in Figure 1. Broadcasters should be wary of how their encoder and packager handle edge cases, such as when a splice point comes just before or after where a natural GOP and video chunk boundary would have been, so that extremely small video chunks and GOPs are avoided.

When the encoder is informed about a splice point, it starts a new group of pictures (GOP),
and when this GOP is encountered downstream by the packager, a new video chunk is created.

Splice points can be signaled to the encoding workflow in-band or out-of-band. More and more multi-screen encoders are capable of handling SCTE-35 messages within an input MPEG TS to determine splice points. Most multi-screen encoders that support SCTE-35 handling also have either a proprietary HTTP-based API or support SCTE-104 for out-of-band splice-point signalling.

There has been a clear need to standardize stream conditioning workflows to allow interoperability between systems deployed for that purpose.

The Event Signaling and Management (ESAM) API specification — a new specification that emerged from the CableLabs Alternate Content working group — describes the relationship and messages that can be passed between a Placement Opportunity Information Server (POIS) and an encoder or packager. The POIS is responsible for identifying and managing the splice points, using the ESAM API supported by the encoder and packager to direct their operations.

The specification defines how both encoder and packager should converse with the POIS, but not how the POIS operates or how it decides on when the splice points should appear. This is considered implementation-specific, but could, for example, use live data from the broadcast automation system to instruct the encoder and packager. ESAM also permits hybrid workflows where the splice points are signaled in-band with SCTE-35 and then decorated with additional properties (including their duration) by the POIS server from out-of-band data sources.

The ESAM specification is relatively speaking brand new, but it is gathering support from encoder/packager vendors. Broadcasters building multi-screen encoding workflows today, even if dynamic content replacement is required initially, should ensure that an upgrade path to ESAM is available in their chosen vendor’s roadmap to ensure future-proofing.

Today, video takes many paths to end-users’ devices. The overview above shows some
of the technology that allows broadcasters to tailor their video content to their users.
One method is content replacement.

User-centric Ad Insertion at the Edge
On the output side of the encoding workflow, video chunks are placed onto the CDN, and a cloud-based service responsible for performing dynamic content replacement receives the manifest file. This means that the actual replacement of content — whether it is ad insertion or content occlusion for rights purposes — is performed, in network topology terms, close to the client device.

The mechanism, as described earlier, relies on the relatively lightweight manipulation of the part of the video delivery protocol that tells the client where to fetch the video chunks. This can be performed efficiently on such a scale as to permit decisions to be made for each individual user accessing the live stream.

By performing the content replacement in the network, the client simply follows the video segments laid out to it by the content replacement service, and the transitions between live and replacement content are completely seamless. That makes it a broadcast experience — a seamless succession of content, advertising, promos and so on, with no freezes, blacks or buffering — but with the potential for user-centric addressability.

Of course, the content replacement policy and user tracking through the integration of this content replacement service lies with the broadcaster’s choice of ad servers and rights management servers. SCTE-130 defines a series of interfaces which include the necessary interface between a content replacement service and an Ad Decision Service (ADS). In the Web world, Video Ad Serving Template (VAST) and Video Player Ad Interface Definition (VPAID) have emerged as generally analogous specifications.

The ability to tailor content down to the individual, by replacing material in stream, in simulcast, while retaining the broadcast-quality experience of seamless content, is a totally new concept. The commercial ecosystem that generates the need for focused ad targeting must now catch up with the technology that supports it.

David Springall, Broadcast Engineering

Media Services: Services Architecture Enables Functional Flexibility

Unfortunately, the term “media services” contains two overloaded terms — “media” and “services.” In this case, when we talk about media services, we are talking about small (some would say atomic), network-based applications “services” that perform simple, focused tasks.

These tasks are somehow related to either essence or metadata used by professional broadcasters, post-production facilities and the film industry.

An example of a “media service” might be a service sitting out on a network that is available to transcode content from one popular video format to another. One can imagine a host of services, including tape ingest, QC and file movement. Each of these services is available out on the network, and can be used to perform a discrete unit of work. Services perform higher functions by grouping a number of atomic services together in a logical way. But, at their core, media services are small, discrete pieces of software that can be combined in different ways to perform work.

This is a significant departure from traditional media infrastructures, where an ingest station consists of tape machines, routers, monitors and other hardware — all hard-wired together to perform a specific function. In fact, entire broadcast chains are built this way. They are highly optimized and efficient, but they can be very difficult to change. And, if one thing is certain these days, it is that change is a permanent part of our business.

A media services architecture allows discrete blocks of functionality to be combined to build complex workflows. As workflows change, blocks can be recombined into modified workflows. If new functionality is added, new services can be deployed. Additionally, discrete services may be used in multiple workflows. So, a transcoder may be deployed in a post-production scenario for one job and then redeployed in a conversion for web applications next.

Building Workflows
When using media services, it is not enough that the services are available out on the network; something must consume those services in order to perform valuable work for the organization. There are several approaches to using services, but, for this article, we are going to focus on two of them — orchestration and event-driven architecture.

Orchestration systems sit on top of media services and use media services to move work through a defined pipeline from start to finish. For example, an orchestration system might have a workflow that ingests a tape, transcodes the content and then saves the file on a large central server.

The orchestration system tracks the progress of the workflow, calling on various services to work on the job as it moves through the pipeline. The orchestration system is responsible for not only dealing with normal flows, but it is also responsible for dealing with error conditions such as a failed transcode. Orchestration can start out simple, but it can become complicated as engineers consider all of the various states and error conditions possible in the workflow.

Event-driven Architecture
Event-driven architecture is another way to use services to perform work. At a high level, in this architecture, something causes something else to happen (an event) that is of significance to the business. Processing engines can be set up to listen for that event, and when the event happens, they can perform actions based on the event. Other processing engines can be listening down-stream, and when one event engine finishes, others can be triggered. In an event-driven architecture, there is no central system guiding the flow of work through a pipeline. The movement of work through the facility is caused by a sequence of events and processes.

In an event-driven architecture, the movement of work through the facility
is caused by a sequence of events and processes.

An operator finishing an ingest activity might create an “Ingest Complete” event. Event processing engines subscribe to event channels, such as the “Ingest Complete” event channel. This particular event-processing engine might have two actions: the first is to notify the QC operator that the file is available for quality control checking; and the second is to publish an “Ingest Complete” notification for other systems that might be interested in the event, such as traffic and automation systems. Both of these systems might update the status of the media based on the “Ingest Complete” event.

Note that, in this example, it would be extremely easy to add another event process engine to the event channel. This engine might be responsible for creating a number of different formats from the original ingested file format. Adding this process does not require modifying a workflow in a central system. All one has to do is to subscribe the transcoding engine to that particular event channel.

It is important to realize that orchestration and event-driven architecture are complementary, and they are frequently deployed together. For example, in the earlier referenced figure, the tape ingest function might be driven by an orchestration system that precisely controls workflow and error-handling conditions. Event-driven architecture would be used to notify other SOA processes once the ingest is complete.

Common Approach
Common service interface definitions are critical. One can imagine a whole universe of services: a content repository service, a media identification service, a publish content to ISP service, and so on. And, one can imagine that several different vendors would make such services available. If each vendor defined the interface to their service independently, the amount of software integration required to build these systems would be huge. On the other hand, if the industry would agree on the service interface definition for an ingest service, for example, then it would be possible to integrate various ingest services into a workflow with minimal additional development.

Common service interface definitions are critical, but it is also critical that we have a common overall framework within which services can be deployed. How do services communicate with orchestration systems and with each other in event-driven architectures? How do newly commissioned services make their presence known on a network? Again, having a harmonized approach to an overall media service architecture will lower costs and shorten implementation time.

One last critical element in the discussion of media services is governance. Governance brings logic and structure to media services. Areas typically in governance include: service life cycle (how services are developed, deployed, deprecated and eventually decommissioned); helping to prioritize the deployment of new services; and how the quality of deployed services can be ensured.

There is a task force in the industry called the Framework for Interoperable Media Services (FIMS). FIMS is a collaboration between the Advanced Media Workflow Association and the European Broadcasting Union. FIMS is the first industry effort focused on developing services for the professional media industry. The FIMS group consists of a Business Board that develops business priorities for service development, and a Technical Board that oversees the development and deployment of FIMS services.

You can learn more about FIMS at For technical information, you can visit the FIMS wiki at This activity has already yielded an overall framework for media services and several specific media service definitions. Work is ongoing, all the work is public, and anyone can participate.

By Brad Gilmer, Broadcast Engineering


DashCast is an open source application which allows users to:

  • Transcode a live/non-live stream in multiple qualities (eg. bitrate and resolution).
  • Segment a continuous stream in small chunks and packetize them for delivery via Dynamic Adaptive Streaming over HTTP (DASH) standard.

The Future of HEVC: It's Coming, but with Plenty of Questions

According to multiple studies, HEVC should deliver up to 50% better compression than H.264 in video on demand (VOD) applications, which means similar quality at half the bitrate. In live encoding, which obviously has to be performed in real time, the initial expectations are a 35% bandwidth reduction at similar quality levels. Alternatively, HEVC can also enable larger resolution movies, whether 2K or 4K.

Essentially, these are the two benefits of HEVC in the streaming space. The first relates to encoding existing SD and HD content with HEVC rather than H.264, enabling cost savings and/or the ability to stream higher quality video to lower bitrate connections. The second relates to opening up new markets for Ultra-High-Definition (UHD) videos.

On the playback side, there are multiple data points, but no real clear picture. Several companies have announced software decoders, but it’s unclear how much horsepower is necessary to drive them. The original targets for HEVC were 10x encoding complexity and 2x–3x decoding complexity as compared to H.264, and most sources have confirmed the 10x figure for encoding. Perusing various comments from various sources, decoding complexity has ranged from “same as H.264” to the 2x–3x figure.

Remember, however, that H.264 playback is accelerated in hardware on most playback platforms, including GPU-accelerated playback on computers. According to my source at graphics vendor NVIDIA, “There currently isn’t any dedicated [hardware] support for HEVC in our current GPUs. I’m pretty limited to what I can say about future products. But I’ll just say that our goal is for our GPUs to support all current video standards. Now, that said, it is quite possible for third parties to write HEVC encoders and decoders using CUDA to use the processing capability of current GPUs.”

Though GPU acceleration may be coming, it isn’t here yet, so H.264 and HEVC aren’t on a level playing field when it comes to accessible playback hardware. Still, according to a source at Qualcomm, Inc., “We are able to get 1080p, 30fps HEVC Main profile video with just a little bit over 50% CPU utilization on a quad-core architecture.”

On the mobile side, my source reported, “At CES we showcased Ittiam’s ARM based decoder and played back 1080p HEVC and 1080p H.264 videos side by side on recently announced Snapdragon 800 platform. Ittiam’s decoder is still in development and will get further enhancement.”

According to a report titled “HEVC Decoding in Consumer Devices,” senior analyst Michelle Abraham from Multimedia Research Group, Inc. estimated that the number of consumer devices that shipped in 2011 and 2012 that would be capable of HEVC playback with a software upgrade totaled around 1.4 billion, with more than a billion more expected to be sold in 2013. According to Abraham, in compiling these statistics she assumed that all PCs shipped in each year would be HEVC-capable.

HEVC Encoding
I spoke with several encoding companies; many were bullish on HEVC and have either made HEVC-related product announcements (Elemental Technologies, Inc.) or will at NAB. Another made the very cogent comment that the encoding side was always ahead of the game and that the path to actual producer adoption is widespread playback availability.

Speaking of playback, none of the major players -- Adobe, Apple, Google, or Microsoft -- have announced HEVC playback support in their respective players, browsers, or mobile or desktop operating systems. One reason why -- and a potential monkey wrench in at least the short-term HEVC adoption cycle -- is that no one knows what it will cost to use HEVC.

Royalty Issues with HEVC
What’s clear at this point is that multiple companies have patents relating to HEVC technology, and they plan to ask for royalties from those who use their technology. This was the case with H.264 as well, and though many in the streaming industry grumbled about the royalties, this disgruntlement certainly didn’t limit H.264’s success.

Two things are different with HEVC. First, where H.264 involved a single group of patent holders administered by MPEG LA, it appears that some HEVC patent holders want to pursue royalties outside of a patent group, which will make it more challenging for HEVC users to license the technologies. According to “Patent Snafus Could Delay New Video Codec,” Mediatek and Qualcomm do not want to join the HEVC group formulated by MPEG LA, and Samsung hasn’t decided either way.

One chipmaker executive, speaking anonymously for the EE Times article, commented, “HEVC has so many patent holders and some of them say they will not be part of the pool but want to collect royalties themselves. If say 20 people all want to collect royalties it will kill the standard -- we need a fixed cost, it cannot be variable,” he added.

Beyond this uncertainty, HEVC is coming to the streaming media market much faster than H.264, where royalty policies were in place well before any significant market adaption. To recount, the H.264 spec was approved in March 2003, and MPEG LA announced licensing terms in November 2003. Obviously, when Apple announced support for H.264 in QuickTime 7 in April 2005, royalty policies were firmly in place. Ditto for when Adobe announced that it would include H.264 in Flash in March 2008, and when Microsoft added H.264 to Silverlight in July 2009.

Our contact at MPEG LA reported that while the HEVC group had met three times as of February 2013, there was still no guarantee that a group would be formed or that all patent holders would join the group. So it appears that HEVC early adopters will have to decide to implement the technology without knowing the cost.

For large companies such as Adobe, Apple, Google, and Microsoft, that might be tenable; the H.264 license was capped, and it’s reasonable to assume that the HEVC license will also be capped. All four companies can amortize that cost over millions of product units shipped, and I think it’s highly likely that one or more of these companies will announce HEVC integration by NAB.

Even the encoding companies that I spoke with commented that they might incorporate HEVC technologies into their encoding tools without knowing the cost, because, as one exec said, “Supporting new formats is the race that we run.” The exec also noted, however, that this was the first time that they were ever forced to consider embracing a codec without having an idea about the licensing structure.

However, let’s get back to the two potential benefits that actual publishers seek from HEVC: cost savings and opening up new products and services. In both cases, it seems unlikely that any producer would use HEVC-encoded video without a known cost structure. Sure, H.264 usage for free internet video is free, but that decision was made under a completely different set of circumstances, and it’s doubtful if HEVC usage will be similarly unencumbered.

How H.264 Became Free
 A short history lesson will explain why H.264 became free. When the terms of the initial H.264 license were announced, there was a royalty on H.264-encoded video deployed in a pay-per-view or subscription operation. The royalty was not on free internet video, at least through the initial term of the license, which ended Dec. 31, 2010.

The licensing terms attributed this waiver to the fact that the internet streaming market was “still developing,” though this is likely disingenuous. The fact of the matter was that the H.264 implementations of that time offered only a slight quality improvement over VP6, the predominant Flash codec, and required more CPU horsepower for playback. There was also no mobile platform such as iOS or Android that wasn’t compatible with VP6 that could force producers to use H.264. So 99% of producers were satisfied with VP6 and wouldn’t have experimented with H.264 if there was a royalty involved.

In February 2010, MPEG LA extended the royalty moratorium for free internet video through December 2016. In an interview with Streaming Media, MPEG LA president and CEO Larry Horn attributed this decision to the fact that “though some companies are doing well with advertising supported video, overall the models are still in flux, and the patent group didn’t want to plug a royalty into a business model that’s still unsettled.”

In May 2010, Google announced WebM, an open source alternative to H.264 that offered very similar quality and playback performance. In August 2010, MPEG LA announced that there would be no royalties on free internet usage of H.264 in perpetuity. Though MPEG LA never publicly admitted that the availability of a free, open source solution contributed to this decision, the timing would suggest it did.

Why HEVC Probably Won’t Be Free
Fast-forward to 2013. HEVC is ready, and at this point, it has no real competition. Sure, you can point to VP9, but considering how poorly Google executed taking VP8/WebM to market, it’s unlikely that any producers -- or patent groups -- will take it seriously. In 2012, online video advertising jumped to $2.9 billion, so it’s tough to say that this market is still in flux.

For these reasons, it seems unlikely that if HEVC enables large publishers to cut their bandwidth costs by 50%, HEVC patent holders wouldn’t want their share. It seems equally unlikely that publishers seeking to reduce bandwidth costs via HEVC would start using the technology until the cost structure was known. So HEVC implementation seeking to harvest this benefit is likely on hold pending the announcement.

HEVC and UHD Video
Again, I’m focusing my analysis on the streaming and OTT markets, since those are the ones I know best. Even without considering the royalty cost uncertainties, it seems unlikely that HEVC will spawn many new UHD-related products and services over the next 2 or 3 years for three reasons: bandwidth, the lack of 4K displays, and the lack of content.

Let’s start with bandwidth. In his blog post “H.265/HEVC Ratification and 4K Video Streaming,” compressionist Alex Zambelli, previously with Microsoft and now with iStreamPlanet, estimated that if HEVC produced a 40% bandwidth saving over H.264, a 4K movie would require bitrates of between 12Mbps and 15Mbps. Assuming that HEVC actually produced the 50% target, these numbers would drop to 10Mbps and 12Mbps.

According to the latest Akamai Technologies “State of the Internet” report, for 3Q 2012, the average connection speed in the U.S. was about 7.2Mbps, up from about 6Mbps the previous year. While users connecting on some premium services could handle HEVC’s 10Mbps–12Mbps in the short term, it’s unclear when a significant portion of the U.S. population will be able to support 4K HEVC movies.

Though the U.S. is far from the performance leader, only South Korea, with an average connection speed of 14.7Mbps, could sustain a 4K movie today, with Japan next at 10.5Mbps. Again, while dedicated satellite or cable networks could certainly carry this load, it’s unlikely that the type of shared internet connection used for OTT would be able to in the short term.

‘Why 4K TVs Are Stupid’
The second issue relates to the installed base of 4K-capable sets, which obviously will be necessary to view 4K movies. According to analyst DisplaySearch, global shipments of 4K sets will be well under a million in 2013, and just over 2 million in 2014.

Projections of 4K and OLED TVs through 2016

Obviously, these are not inspiring numbers, and as you might have gleaned from the section title, there are those who feel they may be overly optimistic. In a CNET article titled “Why 4K TVs Are Stupid,” Geoffrey Morrison ran the math about how much detail your eyes can resolve, and the minimum size of pixels on a screen, and concluded that at a viewing distance of 10', the difference between 1080p and 4K wouldn’t be noticeable on TVs less than 77" in diagonal. It’s a pretty familiar argument, very similar to the 720p versus 1080p debate, where the rule of thumb was that you couldn’t tell the difference on TVs 50" or smaller when watching from farther than 8' away.

In other words, while 4K TVs might make sense for a home entertainment center or man cave, they probably don’t for the typical living room, where mass markets are made. At some point, economies of scale will take over, and 4K sets will be cheaper than 1080p. Until that happens, however, it’s hard to predict that 4K sets will fly off the shelves, particularly because of the third item that will hinder 4K sets, and services targeting them -- a paucity of content.

No Content for 4K
By the time 1080p TV sets became affordable for the masses, there was plenty of HD content to watch. While ultimately we may say the same thing about 4K TVs, that time is clearly not now, both regarding affordability and content.

A good example of both is the 84" Sony XBR-84X900, which retails for $24,999.99 and includes 10 4K movies from Sony’s movie library and a variety of short-form 4K content. To supplement these movies, Sony plans on launching a 4K movie download service in the summer of 2013, potentially delivered on a new Blu-ray Disc standard for 4K movies.

Many movies today are being shot in 4K, and many film-based movies can be rescanned for 4K delivery. Beyond this, however, there’s very little other 4K content available, and minimal 4K production in sports and general-purpose television. So even those viewers who purchase a sufficiently large 4K monitor so that they can see the advantage over 1080p will have little 4K content to watch.

Of course, many 4K TVs can upsample SD and HD content to 4K, with mixed results. One U.K. reviewer found the results “astonishingly good,” while another, more expressive reviewer from Gizmodo commented, “As a result, anyone who spends £20,000 on a 4K TV at the moment will be doomed to watch upscaled HD content. That sucks. You don’t spend quite-nice-car cash on a TV, just so you can watch upscaled content.”

Note that some of the less expensive 4K TVs, such as Westinghouse Electric Corp.’s $2,499 50" set, don’t come with onboard Smart TV functionality or 4K upscaling technology. In these cases, you’d be dependent upon the upscaling provided by the set-top box, optical disc player, or game console.

What does all of this mean for the UHD OTT market? Overall, even though it seems likely that inexpensive HEVC-capable set-top boxes will be available by the end of 2013, there will be minimal 4K content to watch, few 4K TVs to watch it on, and insufficient bandwidth to deliver it.

Download and Watch
One market that seems potentially ripe for larger-than-1080p viewing is the traditional download to view, particularly given the multiple viewing options. For example, TimeScapes is a film by Tom Lowe featuring slow-motion and time-lapse cinematography of the landscape, people, and wildlife of the American Southwest. Shot in 4K, the movie is available in 10 versions, ranging from SD DVD and 1080p Blu-ray to 4K, including custom versions for 30" 2560x1440 displays and the MacBook Retina display (2880x1620). It’s an innovative strategy that could portend the optimal strategy for UHD movie distribution.

This 4K movie is available in 10 different versions.

I asked Lowe about how his sales were distributed among the available offerings. He responded, “Actually the ‘30 inch’ 2560x1440 version is selling like hotcakes. Sales have exceeded my expectations, in terms of percentage sold vs Blu-ray, 1080p download, etc. I would say for every one in ten 1080p HD downloads we sell, we sell about four 2560 copies. So many people have the Dell and Apple 2560 displays, but have never, ever had any video to play at that resolution.”

Interestingly, watching video on tablets and computers seems like the optimal use of 4K screens, as viewers are actually close enough to the device to see the difference. Though it will likely never see the light of day, Panasonic showed an 18.7 x 13.1 tablet computer at CES with a 4K screen -- could true 4K tablets be far behind? And once they’re available, wouldn’t 4K movie downloads from iTunes seem like the natural next step? Once HEVC playback becomes available, it would cut download times and storage space by 50%.

How likely is true 4K viewing on computers? I asked Lowe for his thoughts about computers and tablets as a potential viewing platform. He replied, “I have always believed that 4K monitors are where 4K will really catch on. On a projector, or large TV, you don’t notice it as much. On a monitor only a couple feet away, the difference between 1080p and 4K is very striking. I think 4K monitors will catch on fast once they come out, among gamers, Photoshop enthusiasts, and people watching or making high-res video.”

What’s this add up to? For producers seeking to distribute SD and HD content encoded with HEVC, the lack of a known royalty structure is a major buzzkill. At this point, no one knows whether a single royalty structure will be in place or whether multiple IP properties will pursue royalties independently, or the timing of either of these efforts.

HEVC encoding should be generally available by the end of 2013, if not sooner, and the player-related picture should also be clearer. Though it’s impossible to predict what Apple will do, I would be surprised if there wasn’t an HEVC decoder for the iPad 3 and MacBook Retina line announced before the end of 2013. Ditto for Adobe announcing HEVC playback in Flash.

Those attempting to leverage HEVC to create new opportunities for the distribution of UHD video have the royalty mountain to climb, as well as a paucity of content and viewing platforms and the lack of bandwidth to deliver the streams. Given the data rates involved, it feels like downloading for viewing is a better short-term model, providing custom resolutions for specific resolution displays.

The increased efficiency that HEVC provides will ultimately make streaming video more affordable and deliverable. However, until the royalty picture clears, it’s hard to get excited about any projected cost savings from HEVC. On the UHD side, HEVC is only one piece of the puzzle that must come together to make 4K viewing a reality, and not a very important piece of the puzzle at that.

By Jan Ozer, StreamingMedia

WTF is... H.265 aka HEVC?

When Samsung unveiled its next-generation smartphone, the Galaxy S4, in March this year, most of the Korean giant’s fans focused their attention on the device’s big 5-inch, 1920 x 1080 screen, its quad-core processor and its 13Mp camera. All impressive of course, but incremental steps in the ongoing evolution of the smartphone. More cutting edge is the S4’s promised support for a technology called HEVC.

HEVC is short for High Efficiency Video Coding. It’s the successor to the technology used to encode video stored on Blu-ray Discs and streamed in high-definition digital TV transmissions the world over. The current standard is called H.264 - aka MPEG 4, aka Advanced Video Coding (AVC) - so it’s no surprise that HEVC will become H.265 when the Is and Ts are dotted and crossed on the final, ratified version of the standard later this year.

This final standardisation is just a formality. The International Telecommunication Union (ITU-T), the body which oversees the "H" series of standards, and its partner in video matters, the ISO/IEC Moving Picture Experts Group (MPEG), have both given HEVC sufficient approval. This means device manufacturers such as Samsung, chipmakers such as Broadcom, content providers such as Orange France and mobile phone network operators such as NTT DoCoMo can begin announcing HEVC-related products safe in the knowledge that the standard will be completed with few, if any further changes.

Each successive generation of video codec delivers comparable picture quality at half its predecessor's bit-rate

It has taken H.265 three years to reach this stage, though exploratory work on post-H.264 standards goes back to 2004. The drive to develop the standard - a process overseen by a committee called the Joint Collaborative Team on Video Coding (JVT-VC) and comprising members of both MPEG and ITU-T - was outlined in January 2010 in a call for specification proposals from technology firms and other stakeholders.

Their brief is easy to summarise: H.265 has to deliver a picture of the same perceived visual quality as H.264 but using only half the transmitted volume of data and therefore half the bandwidth. H.264 can happily churn out 1920 x 1080 imagery at 30 frames per second in progressive - ie, frame after frame - mode, but it’s expected to start running out of puff when it comes to the 3840 x 2160 - aka 4K x 2K - let alone the 7680 x 4320 (8K x 4K) resolutions defined as Ultra HD pictures. H.265, then, was conceived as the technology that will make these resolutions achievable with mainstream consumer electronics kit like phones and televisions.

High Resolution, Low Bandwidth
Of course, 4K x 2K and up are, for now, thought of as big-screen TV resolutions. But it wasn’t so very long ago that 1920 x 1080 was considered an only-for-tellies resolution too. Now, though, plenty of phones, of which the Galaxy S4 is merely the latest, have screens with those pixel dimensions. Some tablets have higher resolutions still.

And while today’s mobile graphics cores have no trouble wrangling 2,073,600 pixels 30 times a second, that’s still a heck of a lot of data for mobile networks to carry to them, even over the fastest 4G LTE links. And so, in addition to supporting Ultra HD resolutions on large TVs, H.265 was conceived as a way to deliver larger moving pictures to phones while consuming less bandwidth requirements than H.264 takes up. Or to deliver higher, smoother frame rates over the same width of pipe.

This explains NTT DoCoMo’s interest in the new video technology. Its 2010 proposal to the JVT-VC was one of the five shortlisted from the original 27 suggestions in April 2010. All five could deliver a picture to match a good H.264 stream, but only four, including NTT DoCoMo’s, were also able to deliver a compression ratio as low as a third of what H.264 can manage. The JVT-VC’s target was 50 per cent more efficient compression for the same image size and picture quality.

HEVC assembles its coding units into a tree structure

The remaining proposals were combined and enshrined the the JVT-VC’s first working draft, which was published the following October. The committee and its partners have been refining and testing that specification ever since. Since June 2012, MPEG LA, the company that licences MPEG video technologies, has been bringing together patent holders with intellectual property that touches on the H.265 spec, before licensing that IP to anyone making H.265 encoders and decoders, whether in hardware or software.

So how does the new video standard work its magic?

Like H.264 before it - and all video standards from H.261 on, for that matter - HEVC is based on the same notion of spotting motion-induced differences between frames, and finding near-identical areas within a single frame. These similarities are subtracted from subsequent frames and whatever is left in each partial frame is mathematically transformed to reduce the amount of data needed to store each frame.

Singing, Ringing Tree
When an H.264 frame is encoded, it’s divided into a grid of squares, known as "macroblocks" in the jargon. H.264 macroblocks were no bigger than 16 x 16 pixels, but that’s arguably too small a size for HD imagery and certainly for Ultra HD pictures, so H.265 allows block sizes to be set at up to 64 x 64 pixels, the better to detect finer differences between two given blocks.

In fact, the process is a little more complex than that suggests. While H.264 works at the block level, with a 16 x 16 block containing each pixel’s brightness - "luma" in the jargon - and two 8 x 8 blocks of colour - "chroma" data - H.265 uses a structure called a "Coding Tree". Encoder-selected 16 x 16, 32 x 32 or 64 x 64 blocks contain pixel brightness information. These luma blocks can then be partitioned into any number of smaller sub-blocks containing the colour - "chroma" - data. More light-level data is encoded than colour data because the human eye is better able to detect differences the brightness of adjacent pixels than it is colour differences.

HEVC has a smart picture sub-division system

HEVC samples pixels as "YCrCb" data: a brightness value (Y) followed by a number that shows how far the colour of the pixel deviates from grey toward, respectively, red and blue. It uses 4:2:0 sampling - each colour sample has one-fourth the number of samples than brightness and specifically half the number of samples in each axis. Samples are 8-bit or 10-bit values, depending on the HEVC Profile the encoder is following. You can think of a 1080p picture containing 1920 x 1080 pixels worth of brightness information but only 960 x 540 pixels worth of colour information. Don’t worry about the "loss" of colour resolution - you literally can’t see it.

A given "Coding Tree Unit" is the root of a tree structure that can comprise a number of smaller, rectangular "Coding Blocks", which can, in turn, be dividing into smaller still "Transform Blocks", as the encoder sees fit.

The block structure may differ between H.264 and H.265, but the motion detection principle is broadly the same. The first frame’s blocks are analysed to find those that are very similar to each other - areas of equally blue sky, say. Subsequent frames in a sequence are not only checked for these intrapicture similarities but also to determine how blocks move between frames in a sequence. So a block of pixels that contain the same colour and brightness values through a sequence of frames, but changes location from frame to frame, only needs to be stored once. An accompanying motion vector tells the decoder where to place those identical pixels in each successive recovered frame.

Made for Parallel Processing
H.264 encoders check intrapicture similarities in eight possible directions from the source block. H.265 extends this to 33 possible vectors - the better to reveal more subtle pixel block movement. There are also other tricks that H.265 gains to predict more accurately where to duplicate a given block to generate the final frame.

As HEVC boffins Gary Sullivan, Jens-Rainer Ohm, Woo-Jin Han and Thomas Wiegand put it in a paper published in the IEEE journal Transactions on Circuits and Systems for Video Technology in December 2012: “The residual signal of the intra- or interpicture prediction, which is the difference between the original block and its prediction, is transformed by a linear spatial transform. The transform coefficients are then scaled, quantised, entropy coded and transmitted together with the prediction information.”

When the frame is decoded, extra filters are applied to smooth out artefacts generated by the blocking and quantisation processes. Incidentally, we can talk about frames here, rather than interlaced fields, because H.265 doesn’t support interlacing. So there will be no 1080i vs 1080p debate with HEVC. “No explicit coding features are present in the HEVC design to support the use of interlaced scanning,” say the minds behind the standard. The reason? “Interlaced scanning is no longer used for displays and is becoming substantially less common for distribution.” At last we’ll be able to leave our CRT heritage behind.

H.264 vs H.265

In addition to all this video coding jiggery pokery, H.265 includes the kind of high-level information that H.264 data can contain to help the decoder cope with the different methods by which a stream of video data can move from source file to screen - along a cable or across a wireless network, say - which will result in degrees of data packet loss. But HEVC gains extra methods for segmenting an image or the streamed data to take better advantage of parallel processing architectures and synchronise the output of however many image processors are present.

H.264 has the notion of "slices" - sections of the data that can be decoded independently of other sections, either whole frames or parts of frames. H.265 adds "tiles", which are an even number of 256 x 64 slices into which a picture can be optionally segmented so that each contains the same number of HEVC’s core Coding Tree Units so there’s no need to synchronise output. This is because each graphics core processes any given CTU in the same amount of time. Other tiles sizes may be allowed in future versions of the standard.

H.265... Everywhere
An alternative option available to the encoder is Wavefront Parallel Processing, which cuts each slice into rows of CTUs. Work can begin on processing any given row once just two CTUs have been decoded from the preceding row (once decoding clues have been recovered from them). The exception, of course, is the very first row in the sequence. This approach, say HEVC boffins, makes for fewer artefacts than tiling and may yield a higher compression ratio, but its processing power requirements are greater than tiling. For now, the choice is tiles, WPP or neither, though the use of both may be permitted in future HEVC revisions.

H.265 makes jumping around within the video a smoother process by allowing complete frames - pictures that can be decoded without information from any preceding frame - to be more logically marked as such for the decoder. Don’t forget, the decoding order of information within a stream isn’t necessarily the same as the order in which the frames are displayed. Jump into a stream part-way and some upcoming complete frames that are needed for decode pre-entry point frames but not for those that will now be displayed can - if correctly labelled - be safely ignored by the decoder.

HEVC vs AVC: for the same bit-rate get a better picture - or comparable image quality for at least half the bit rate

Since a video stream of necessity must incorporate at least one complete frame, it’s no surprise that H.265 has a still-image profile, Main Still Picture. Like the basic, Main profile, MSP is capped at eight bits per sample. If you want more, you’ll be needing the Main 10 profile, which, as its name suggests, used 10-bit samples. So far 13 HEVC levels have been defined, essentially based on the picture size and running from 176 x 144 to 7680 x 4320. The levels yield maximum bitrates of 128Kbps to 240Mbps in the mainstream applications tier, and 30-800Mbps in the high performance tier, which might be used for broadcast quality pre-production work and storage, for instance.

This, then, is the version of HEVC enshrined on the first final drafts of H.265. Work is already underway on extensions to the specification to equip the standard with the technology needed for 3D and multi-view coding, to support greater colour depths (12-bit initially) and better colour component sampling options such as 4:2:2 and 4:4:4, and to enable scalable coding, which allows a high quality stream to deliver a lower quality sub-stream by dropping packets.

The H.265 extensions are all some way off. First we all need H.265 to be supported by our preferred operating systems, applications and - in the case of low-power mobile devices - in the chips they use. With H.264 now the de facto standard for video downloads and streaming, there’s a motivation to move up to the next version, especially if it means buyers can download comparable quality videos in half the time. It seems certain to be a part of "Blu-ray 2.0" whenever that emerges, and probably future ATSC TV standards, but the ability to get high quality video down off the internet is surely a stronger unique selling point than delivering 4K or 8K video on optical disc to an ever-declining number of consumers.

Not Yet Ready for Primetime
Videophiles will demand 4K and perhaps 8K screens, but for most of us there’s little visible benefit in moving beyond 1080p - unless we also move our sofas very much closer to our TVs. Even if broadcasters migrate quickly to 4K, perhaps in time to show 2014’s World Cup tournament in the format, they can do so with H.264. But if you’re going to have to get a new, bigger screen, you may as well get one with a new, better codec on board. Still, H.265 in the living room looks set to be of limited interest for some years yet.

4K streaming may be equally far off, but more efficient 1080p streaming is something many folks would like to have now. Given the competition among IPTV services, it’s not hard to imaging many providers hopping upon H.265 to improve their offerings, once the standard becomes supported by the web browsers and apps they use. Greater compression of SD and HD video content than H.264 can provide is what will drive adoption of H.265 in the near-term.

And of course, it’s going to require greater processing power than previous codecs needed - 10 times as much for encoding, some specialists reckon, with two-to-three times as much for decoding. Producing H.265 video requires the encoder to evaluate many more encoding options at many more decision points that is the case with H.264, and that takes time. Likewise regenerating complete frames from the many more hints that the encoder provides. Encoding and decoding HEVC doesn’t necessarily favour chips with higher clock speeds, though. H.265’s emphasis on parallelisation favours CPUs and GPUs equipped with lots of cores that can crunch numbers simultaneously.

“HEVC/H.265 is an ideal fit for GPU encoding,” say staffers at Elemental Technologies, a maker of video encoding hardware. “HEVC/H.265 encoding presents computational complexities up to ten times that of H.264 encoding, for which the massively parallel GPU architecture is well-suited. With 500 different ways to encode each macroblock, the demanding processing requirements of H.265 will strain many existing hardware platforms.”

Easy to say, harder to deal with. We won’t know how well encoders and decoders work - be they implemented in software or hardware - until they actually arrive. Software support will come first. Market-watcher Multimedia Research Group reckons there are around 1.4 billion gadgets already on the market that, given a suitable software upgrade, will be able to play H.265 video. A billion more are coming out this year. But being able to decode H.265 is one thing - being able to do so smoothly and with an efficient use of energy is something else.

This kind of uncertainty will hold H.265 back in the short term, at least as far as its mainstream adoption goes. That’s no great surprise, perhaps. Even today, the majority of Britain’s broadcast digital TV channels used MPEG-2 - aka H.262 - with H.264 used only for HD content. H.265-specific chippery is expected this year, but not in shipping product until Q4 at the earliest - Q1 2014 is a more practical estimate.

By Tony Smith, The Register

HTML5 Video at Netflix

Today, we’re excited to talk about proposed extensions to HTML5 video that enable playback of premium video content on the web. We currently use Microsoft Silverlight to deliver streaming video to web browsers on the PC and Mac. It provides a high-quality streaming experience and lets us easily experiment with improvements to our adaptive streaming algorithms.

But since Microsoft announced the end of life of Silverlight 5 in 2021, we need to find a replacement some time within the next 8 years. We'd like to share some progress we've made towards our goal of moving to HTML5 video.

Silverlight and Browser Plugins
Silverlight is a browser plugin which allows our customers to simply click "Play" on the Netflix website and watch their favorite movies or TV shows, but browser plugins have a few disadvantages. First, customers need to install the browser plugin on their computer prior to streaming video. For some customers, Netflix might be the only service they use which requires the Silverlight browser plugin. Second, some view browser plugins as a security and privacy risk and choose not to install them or use tools to disable them. Third, not all browsers support plugins (eg: Safari on iOS, Internet Explorer in Metro mode on Windows 8), so the ability to use them across a wide range of devices and browsers is becoming increasingly limited. We're interested to solve these problems as we move to our next generation of video playback on the web.

HTML5 Premium Video Extensions
Over the last year, we've been collaborating with other industry leaders on three W3C initiatives which are positioned to solve this problem of playing premium video content directly in the browser without the need for browser plugins such as Silverlight. We call these, collectively, the "HTML5 Premium Video Extensions":

Media Source Extensions (MSE)
The W3C Media Source Extensions specification "extends HTMLMediaElement to allow JavaScript to generate media streams for playback." This makes it possible for Netflix to download audio and video content from our content delivery networks and feed it into the video tag for playback. Since we can control how to download the audio/video content in our JavaScript code, we can choose the best HTTP server to use for content delivery based on real-time information, and we can implement critical behavior like failing over to alternate servers in the event of an interruption in content delivery. In addition, this allows us to implement our industry-leading adaptive streaming algorithms (real-time selection of audio/video bitrates based on available bandwidth and other factors) in our JavaScript code. Perhaps best of all, we can iterate on and improve our content delivery and adaptive streaming algorithms in JavaScript as our business needs change and as we continue to experiment.

Encrypted Media Extensions (EME)
The W3C Encrypted Media Extensions specification "extends HTMLMediaElement providing APIs to control playback of protected content." The video content we stream to customers is protected with Digital Rights Management (DRM). This is a requirement for any premium subscription video service. The Encrypted Media Extensions allow us to play protected video content in the browser by providing a standardized way for DRM systems to be used with the media element. For example, the specification identifies an encrypted stream format (Common Encryption for the ISO file format, using AES-128 counter mode) and defines how the DRM license challenge/response is handled, both in ways that are independent of any particular DRM. We need to continue to use DRM whether we use a browser plugin or the HTML5 media element, and these extensions make it possible for us to integrate with a variety of DRM systems that may be used by the browser.

Web Cryptography API (WebCrypto)
The W3C Web Cryptography API specification defines an API for "basic cryptographic operations in web applications, such as hashing, signature generation and verification, and encryption and decryption." This API allows Netflix to encrypt and decrypt communication between our JavaScript and the Netflix servers. This is required to protect user data from inspection and tampering, and allows us to provide our subscription video service on the web.

First Implementation in Chrome OS
We've been working with Google to implement support for the HTML5 Premium Video Extensions in the Chrome browser, and we've just started using this technology on the Samsung ARM-Based Chromebook. Our player on this Chromebook device uses the Media Source Extensions and Encrypted Media Extensions to adaptively stream protected content. WebCrypto hasn't been implemented in Chrome yet, so we're using a Netflix-developed PPAPI (Pepper Plugin API) plugin which provides these cryptographic operations for now. We will remove this last remaining browser plugin as soon as WebCrypto is available directly in the Chrome browser. At that point, we can begin testing our new HTML5 video player on Windows and OS X.

We're excited about the future of premium video playback on the web, and we look forward to the day that these Premium Video Extensions are implemented in all browsers!

By Anthony Park and Mark Watson, The Netflix Tech Blog

Browse Proxy Transcoder

The use of a browse file – a frame accurate low bit rate proxy of the master essence – is an important enabler in lightweight IT based broadcast and production workflows.

The BLM Ingest service provides the ability to create the browse proxy in real time during linear ingest and in BLM deployments in which material arrives in the file domain a simple browse proxy transcoder is provided for ‘low-res’ generation.

BLM now make this simple tool available in a cut-down watch-folder only version to allow system builders to reap the benefit of low-res operations without having to tie up a fully featured transcoder.

The service will accept source material as an D-10 IMX, DNxHD, AVCi-100, DVCPro, XDCAM HD, MPEG-2, DV, H.264 or ProRes and transcode it to a defined resolution and bit rate.

AS-11 Metadata Validator

The Digital Production Partnership (DPP) is an industry-funded non-profit partnership that advocates and promotes the many benefits associated with digital production for broadcast television.

The DPP have recently defined the AS-11 metadata specification. AS-11 specifies a vendor-neutral subset of the MXF file format to use for delivery of finished programming from program producers and program distributors to broadcast stations. AS-11 defines a minimal core metadata set required in all AS-11 files, a program segmentation metadata scheme, and permits inclusion of custom shim-specific metadata in the MXF file.

Blue Lucy Media makes available an AS-11 Validator, a simple and free tool which interrogates a given file, displays the AS-11 specific metadata and highlight a missing or invalid metadata.

MXF TimeCode Editor

Blue Lucy Media is providing a free tool which allows users to read and set the start time-code and frame rate values of an MXF file.

"Tears of Steel" Original Footage

This directory contains original live action footage for the short film "Tears of Steel".

The frames are individual image files in OpenEXR format 16-bit 4096x2160, converted from camera raw. Playback is at 24 frames per second. There are also 1920x1012 exr versions in the 'linear_hd' directories and occassional png or jpeg derivatives.

The short film Tears of Steel, code-named "mango" during development, was a Blender Open Movie Project. It developed an open vfx pipeline for compositing computer generated elements with live action video. It was originally released 2012 September 26.

AMWA Releases ‘MXF for Production’ Specification AS-10

The Advanced Media Workflow Association (AMWA) has released Application Specification AS-10, “MXF for Production”. This specification, built on the Sony XDCAM HD Format (SMPTE RDD-9), allows an end-to-end workflow to use a single file.

AS-10 is a MXF file format for typical end to end production workflows including camera acquisition, server acquisition, editing, play-out, digital distribution and archive. AS-10 is compatible with existing MXF based systems & devices that a broadcaster may already have deployed.

Previous specifications had ambiguities that could lead to inconsistent implementations. AS-10 adds details to facilitate the design of interoperable products supporting the codec family.

AS-10 does not require Descriptive Metadata. However Descriptive Metadata may be useful as part of a production workflow and is supported by the Specification AS-10 adds support for descriptive metadata, removing the need for XML sidecars to carry metadata.

Source: AMWA