WTF is... H.265 aka HEVC?

When Samsung unveiled its next-generation smartphone, the Galaxy S4, in March this year, most of the Korean giant’s fans focused their attention on the device’s big 5-inch, 1920 x 1080 screen, its quad-core processor and its 13Mp camera. All impressive of course, but incremental steps in the ongoing evolution of the smartphone. More cutting edge is the S4’s promised support for a technology called HEVC.

HEVC is short for High Efficiency Video Coding. It’s the successor to the technology used to encode video stored on Blu-ray Discs and streamed in high-definition digital TV transmissions the world over. The current standard is called H.264 - aka MPEG 4, aka Advanced Video Coding (AVC) - so it’s no surprise that HEVC will become H.265 when the Is and Ts are dotted and crossed on the final, ratified version of the standard later this year.

This final standardisation is just a formality. The International Telecommunication Union (ITU-T), the body which oversees the "H" series of standards, and its partner in video matters, the ISO/IEC Moving Picture Experts Group (MPEG), have both given HEVC sufficient approval. This means device manufacturers such as Samsung, chipmakers such as Broadcom, content providers such as Orange France and mobile phone network operators such as NTT DoCoMo can begin announcing HEVC-related products safe in the knowledge that the standard will be completed with few, if any further changes.


 
Each successive generation of video codec delivers comparable picture quality at half its predecessor's bit-rate


It has taken H.265 three years to reach this stage, though exploratory work on post-H.264 standards goes back to 2004. The drive to develop the standard - a process overseen by a committee called the Joint Collaborative Team on Video Coding (JVT-VC) and comprising members of both MPEG and ITU-T - was outlined in January 2010 in a call for specification proposals from technology firms and other stakeholders.

Their brief is easy to summarise: H.265 has to deliver a picture of the same perceived visual quality as H.264 but using only half the transmitted volume of data and therefore half the bandwidth. H.264 can happily churn out 1920 x 1080 imagery at 30 frames per second in progressive - ie, frame after frame - mode, but it’s expected to start running out of puff when it comes to the 3840 x 2160 - aka 4K x 2K - let alone the 7680 x 4320 (8K x 4K) resolutions defined as Ultra HD pictures. H.265, then, was conceived as the technology that will make these resolutions achievable with mainstream consumer electronics kit like phones and televisions.


High Resolution, Low Bandwidth
Of course, 4K x 2K and up are, for now, thought of as big-screen TV resolutions. But it wasn’t so very long ago that 1920 x 1080 was considered an only-for-tellies resolution too. Now, though, plenty of phones, of which the Galaxy S4 is merely the latest, have screens with those pixel dimensions. Some tablets have higher resolutions still.

And while today’s mobile graphics cores have no trouble wrangling 2,073,600 pixels 30 times a second, that’s still a heck of a lot of data for mobile networks to carry to them, even over the fastest 4G LTE links. And so, in addition to supporting Ultra HD resolutions on large TVs, H.265 was conceived as a way to deliver larger moving pictures to phones while consuming less bandwidth requirements than H.264 takes up. Or to deliver higher, smoother frame rates over the same width of pipe.

This explains NTT DoCoMo’s interest in the new video technology. Its 2010 proposal to the JVT-VC was one of the five shortlisted from the original 27 suggestions in April 2010. All five could deliver a picture to match a good H.264 stream, but only four, including NTT DoCoMo’s, were also able to deliver a compression ratio as low as a third of what H.264 can manage. The JVT-VC’s target was 50 per cent more efficient compression for the same image size and picture quality.



 
HEVC assembles its coding units into a tree structure


The remaining proposals were combined and enshrined the the JVT-VC’s first working draft, which was published the following October. The committee and its partners have been refining and testing that specification ever since. Since June 2012, MPEG LA, the company that licences MPEG video technologies, has been bringing together patent holders with intellectual property that touches on the H.265 spec, before licensing that IP to anyone making H.265 encoders and decoders, whether in hardware or software.

So how does the new video standard work its magic?

Like H.264 before it - and all video standards from H.261 on, for that matter - HEVC is based on the same notion of spotting motion-induced differences between frames, and finding near-identical areas within a single frame. These similarities are subtracted from subsequent frames and whatever is left in each partial frame is mathematically transformed to reduce the amount of data needed to store each frame.


Singing, Ringing Tree
When an H.264 frame is encoded, it’s divided into a grid of squares, known as "macroblocks" in the jargon. H.264 macroblocks were no bigger than 16 x 16 pixels, but that’s arguably too small a size for HD imagery and certainly for Ultra HD pictures, so H.265 allows block sizes to be set at up to 64 x 64 pixels, the better to detect finer differences between two given blocks.

In fact, the process is a little more complex than that suggests. While H.264 works at the block level, with a 16 x 16 block containing each pixel’s brightness - "luma" in the jargon - and two 8 x 8 blocks of colour - "chroma" data - H.265 uses a structure called a "Coding Tree". Encoder-selected 16 x 16, 32 x 32 or 64 x 64 blocks contain pixel brightness information. These luma blocks can then be partitioned into any number of smaller sub-blocks containing the colour - "chroma" - data. More light-level data is encoded than colour data because the human eye is better able to detect differences the brightness of adjacent pixels than it is colour differences.



 
HEVC has a smart picture sub-division system


HEVC samples pixels as "YCrCb" data: a brightness value (Y) followed by a number that shows how far the colour of the pixel deviates from grey toward, respectively, red and blue. It uses 4:2:0 sampling - each colour sample has one-fourth the number of samples than brightness and specifically half the number of samples in each axis. Samples are 8-bit or 10-bit values, depending on the HEVC Profile the encoder is following. You can think of a 1080p picture containing 1920 x 1080 pixels worth of brightness information but only 960 x 540 pixels worth of colour information. Don’t worry about the "loss" of colour resolution - you literally can’t see it.

A given "Coding Tree Unit" is the root of a tree structure that can comprise a number of smaller, rectangular "Coding Blocks", which can, in turn, be dividing into smaller still "Transform Blocks", as the encoder sees fit.

The block structure may differ between H.264 and H.265, but the motion detection principle is broadly the same. The first frame’s blocks are analysed to find those that are very similar to each other - areas of equally blue sky, say. Subsequent frames in a sequence are not only checked for these intrapicture similarities but also to determine how blocks move between frames in a sequence. So a block of pixels that contain the same colour and brightness values through a sequence of frames, but changes location from frame to frame, only needs to be stored once. An accompanying motion vector tells the decoder where to place those identical pixels in each successive recovered frame.


Made for Parallel Processing
H.264 encoders check intrapicture similarities in eight possible directions from the source block. H.265 extends this to 33 possible vectors - the better to reveal more subtle pixel block movement. There are also other tricks that H.265 gains to predict more accurately where to duplicate a given block to generate the final frame.

As HEVC boffins Gary Sullivan, Jens-Rainer Ohm, Woo-Jin Han and Thomas Wiegand put it in a paper published in the IEEE journal Transactions on Circuits and Systems for Video Technology in December 2012: “The residual signal of the intra- or interpicture prediction, which is the difference between the original block and its prediction, is transformed by a linear spatial transform. The transform coefficients are then scaled, quantised, entropy coded and transmitted together with the prediction information.”

When the frame is decoded, extra filters are applied to smooth out artefacts generated by the blocking and quantisation processes. Incidentally, we can talk about frames here, rather than interlaced fields, because H.265 doesn’t support interlacing. So there will be no 1080i vs 1080p debate with HEVC. “No explicit coding features are present in the HEVC design to support the use of interlaced scanning,” say the minds behind the standard. The reason? “Interlaced scanning is no longer used for displays and is becoming substantially less common for distribution.” At last we’ll be able to leave our CRT heritage behind.



 
H.264 vs H.265


In addition to all this video coding jiggery pokery, H.265 includes the kind of high-level information that H.264 data can contain to help the decoder cope with the different methods by which a stream of video data can move from source file to screen - along a cable or across a wireless network, say - which will result in degrees of data packet loss. But HEVC gains extra methods for segmenting an image or the streamed data to take better advantage of parallel processing architectures and synchronise the output of however many image processors are present.

H.264 has the notion of "slices" - sections of the data that can be decoded independently of other sections, either whole frames or parts of frames. H.265 adds "tiles", which are an even number of 256 x 64 slices into which a picture can be optionally segmented so that each contains the same number of HEVC’s core Coding Tree Units so there’s no need to synchronise output. This is because each graphics core processes any given CTU in the same amount of time. Other tiles sizes may be allowed in future versions of the standard.


H.265... Everywhere
An alternative option available to the encoder is Wavefront Parallel Processing, which cuts each slice into rows of CTUs. Work can begin on processing any given row once just two CTUs have been decoded from the preceding row (once decoding clues have been recovered from them). The exception, of course, is the very first row in the sequence. This approach, say HEVC boffins, makes for fewer artefacts than tiling and may yield a higher compression ratio, but its processing power requirements are greater than tiling. For now, the choice is tiles, WPP or neither, though the use of both may be permitted in future HEVC revisions.

H.265 makes jumping around within the video a smoother process by allowing complete frames - pictures that can be decoded without information from any preceding frame - to be more logically marked as such for the decoder. Don’t forget, the decoding order of information within a stream isn’t necessarily the same as the order in which the frames are displayed. Jump into a stream part-way and some upcoming complete frames that are needed for decode pre-entry point frames but not for those that will now be displayed can - if correctly labelled - be safely ignored by the decoder.


 
HEVC vs AVC: for the same bit-rate get a better picture - or comparable image quality for at least half the bit rate


Since a video stream of necessity must incorporate at least one complete frame, it’s no surprise that H.265 has a still-image profile, Main Still Picture. Like the basic, Main profile, MSP is capped at eight bits per sample. If you want more, you’ll be needing the Main 10 profile, which, as its name suggests, used 10-bit samples. So far 13 HEVC levels have been defined, essentially based on the picture size and running from 176 x 144 to 7680 x 4320. The levels yield maximum bitrates of 128Kbps to 240Mbps in the mainstream applications tier, and 30-800Mbps in the high performance tier, which might be used for broadcast quality pre-production work and storage, for instance.

This, then, is the version of HEVC enshrined on the first final drafts of H.265. Work is already underway on extensions to the specification to equip the standard with the technology needed for 3D and multi-view coding, to support greater colour depths (12-bit initially) and better colour component sampling options such as 4:2:2 and 4:4:4, and to enable scalable coding, which allows a high quality stream to deliver a lower quality sub-stream by dropping packets.

The H.265 extensions are all some way off. First we all need H.265 to be supported by our preferred operating systems, applications and - in the case of low-power mobile devices - in the chips they use. With H.264 now the de facto standard for video downloads and streaming, there’s a motivation to move up to the next version, especially if it means buyers can download comparable quality videos in half the time. It seems certain to be a part of "Blu-ray 2.0" whenever that emerges, and probably future ATSC TV standards, but the ability to get high quality video down off the internet is surely a stronger unique selling point than delivering 4K or 8K video on optical disc to an ever-declining number of consumers.


Not Yet Ready for Primetime
Videophiles will demand 4K and perhaps 8K screens, but for most of us there’s little visible benefit in moving beyond 1080p - unless we also move our sofas very much closer to our TVs. Even if broadcasters migrate quickly to 4K, perhaps in time to show 2014’s World Cup tournament in the format, they can do so with H.264. But if you’re going to have to get a new, bigger screen, you may as well get one with a new, better codec on board. Still, H.265 in the living room looks set to be of limited interest for some years yet.

4K streaming may be equally far off, but more efficient 1080p streaming is something many folks would like to have now. Given the competition among IPTV services, it’s not hard to imaging many providers hopping upon H.265 to improve their offerings, once the standard becomes supported by the web browsers and apps they use. Greater compression of SD and HD video content than H.264 can provide is what will drive adoption of H.265 in the near-term.

And of course, it’s going to require greater processing power than previous codecs needed - 10 times as much for encoding, some specialists reckon, with two-to-three times as much for decoding. Producing H.265 video requires the encoder to evaluate many more encoding options at many more decision points that is the case with H.264, and that takes time. Likewise regenerating complete frames from the many more hints that the encoder provides. Encoding and decoding HEVC doesn’t necessarily favour chips with higher clock speeds, though. H.265’s emphasis on parallelisation favours CPUs and GPUs equipped with lots of cores that can crunch numbers simultaneously.

“HEVC/H.265 is an ideal fit for GPU encoding,” say staffers at Elemental Technologies, a maker of video encoding hardware. “HEVC/H.265 encoding presents computational complexities up to ten times that of H.264 encoding, for which the massively parallel GPU architecture is well-suited. With 500 different ways to encode each macroblock, the demanding processing requirements of H.265 will strain many existing hardware platforms.”

Easy to say, harder to deal with. We won’t know how well encoders and decoders work - be they implemented in software or hardware - until they actually arrive. Software support will come first. Market-watcher Multimedia Research Group reckons there are around 1.4 billion gadgets already on the market that, given a suitable software upgrade, will be able to play H.265 video. A billion more are coming out this year. But being able to decode H.265 is one thing - being able to do so smoothly and with an efficient use of energy is something else.

This kind of uncertainty will hold H.265 back in the short term, at least as far as its mainstream adoption goes. That’s no great surprise, perhaps. Even today, the majority of Britain’s broadcast digital TV channels used MPEG-2 - aka H.262 - with H.264 used only for HD content. H.265-specific chippery is expected this year, but not in shipping product until Q4 at the earliest - Q1 2014 is a more practical estimate.

By Tony Smith, The Register