HEVC Demystified

A nice introduction to HEVC available from Elemental Technologies' web site.

DIAL Offers Open Alternative to Apple AirPlay

Netflix and YouTube have led the development of a simple protocol known as DIAL to allow second screen apps to talk to server applications on network-connected devices and displays. Consumer electronics companies including Samsung and Sony have contributed to the development.

YouTube, Dailymotion, Hulu, Disney and the BBC are among those interested in developing applications. It offers a simple open alternative to Apple AirPlay or other solutions such as Miracast and could change the way apps work with network connected televisions.

DIAL, which stands for DIscovery And Launch, is a simple protocol that second screen apps can use to discover and launch apps on first screen devices.

For instance, it means that users can simply launch a compatible mobile app and tap the “Play on TV” button in the app to view the output on a network-connected television device or display on the same home network.

Unlike AirPlay, which is currently proprietary to Apple and not available for licence to third parties, DIAL is an open specification that can be freely implemented without royalties. It has the potential to become widely supported, providing a missing link between second screen apps and first screen displays.

The DIAL protocol has two components: DIAL Service Discovery and the DIAL REST Service.

DIAL Service Discovery enables a DIAL client device to discover DIAL servers on its local network segment and obtain access to the DIAL REST Service on those devices. It is based on SSDP, the Simple Service Discovery Protocol defined by UPnP.

The DIAL REST Service is based on HTTP and enables a DIAL client to query, launch and optionally stop applications on a DIAL Server device.

Beyond that, it is up to developers to establish further communication, based on their requirements.

It does not address things like mirroring a display on a television screen. That is more the province of another system known as Miracast, a peer-to-peer wireless screencast standard created by the Wi-Fi Alliance based on WiFiDirect.

In many respects, however, DIAL is more open and flexible than Apple AirPlay. It can launch compatible apps installed on network connected devices or displays, or redirect a user to an apps store. If supported by the target device it will also be able to launch web apps.

DIAL can be implemented on many devices and not limited to Apple iOS or Android ecosystems. Code for a sample client and server implementation are available for download.

The DIAL specification and mark are copyright Netflix. The specification is maintained by Netflix and YouTube with input from a variety of partners. First screen applications have been registered by the BBC, Hulu, Dailymotion and Disney, among others.

Samsung and Sony provided significant guidance to ensure that DIAL would be a compatible and effective solution for first screen devices and also meet their goals for second screen user experiences.

It is understood that some Samsung and Sony devices already support DIAL. Other manufacturers are also expected to offer support.

As a software protocol, it could potentially be supported through a software update for existing devices.

AirPlay works brilliantly within the Apple ecosystem, for instance with an Apple TV. That may suit Apple and its ultimate ambitions in this space, but DIAL offers an alternative for the rest of the market. If it receives widespread adoption it could significantly increase the usability and functionality of smart television devices and displays.

Source: informitv

A New, More Efficient Standard for MPEG Systems

Widely adopted video-coding standards are subject to obsolescence that follows a different form of “Moore’s Law.” Although silicon speed roughly doubles every 18 months, video-coding efficiency doubles about every ten years. The longer time span is influenced by other factors, such as the time needed to replace a vast and expensive content delivery infrastructure.

MPEG-2 video compression (essentially an update of MPEG-1) was first released in 1995, with digital satellite delivery a major application, followed soon afterwards by deployment on DVDs and digital terrestrial service. Although MPEG-4/AVC (aka MPEG-4 Part 10 or H.264) was released only a few years later (and yielded about a 50 percent bit-rate savings), it took well into the 2000s for the codec to become entrenched into professional and consumer applications, on satellite, cable and Blu-ray discs.

And now, the next codec is nearly upon us: High-efficiency Video Coding (HEVC). This next-generation video standard is currently being developed by the JCT-VC team, a joint effort between MPEG and the Video Coding Experts Group (VCEG). The finalized HEVC standard is expected to bring another 50-percent bit-rate savings, compared to equivalent H.264/AVC encoding.

HEVC has just been ratified by ISO and ITU — ISO/IEC 23008-2 MPEG-H Part 2 and ITU-T Rec. H.265. HEVC codecs are then expected to be adopted quickly in many devices, such as camcorders, DSLRs, digital TVs, PCs, set-top boxes, smartphones and tablets.


HEVC
The HEVC standard incorporates numerous improvements over AVC, including a new prediction block structure, and updates to the toolkit that include intra-prediction, inverse transforms, motion compensation, loop filtering and entropy coding. A major difference from MPEG-2 and AVC is a new framework encompassing Coding Units (CUs), Prediction Units (PUs) and Transform Units (TUs).

Coding units (CUs) define a sub-partitioning of a picture into arbitrary rectangular regions. The CU replaces the macroblock structure of previous video coding standards, and contains one or more Prediction Units and Transform Units, as shown in Figure 1. The PU is the elementary unit for intra- and inter-prediction, and the TU is the basic unit for transform and quantization.



 
Figure 1: HEVC incorporates a new framework encompassing Coding Units (CUs), Prediction Units (PUs) and Transform Units (TUs), which replaces the macroblock structure of previous video coding standards with one or more PUs and TUs.


Overall, this framework describes a treelike structure in which the individual branches can have different depths for different portions of a picture. Each frame is divided into largest Coding Units that can be recursively split into smaller CUs using a generic quad-tree segmentation structure, as shown in Figure 2. CUs can be further split into PUs and TUs. This new structure greatly reduces blocking artifacts, while at the same time providing a more efficient coding of picture-detail regions.




 
Figure 2: With HEVC, video frames are divided into a hierarchical quad-tree coding structure that uses Coding Units, Prediction Units and Transform Units.


MPEG-2 intra-prediction employs fixed blocks for transform coding and motion compensation. AVC went beyond this by allowing multiple block sizes. HEVC also divides the picture into coding tree blocks, which are 64 x 64-, 32 x 32-,16 x 16-, or 8 x 8-pixel regions. But these Coding Units can now be hierarchically subdivided all the way down to 4 x 4-sized units. In addition, an internal bit-depth increase allows encoding of video pictures by processing them as having a color depth higher than eight bits.

HEVC also specifies 33 different intra-prediction directions, as well as planar and DC modes, which reconstruct smooth regions or directional structures, respectively, in a way that hides artifacts better.


Parallel Processing
The picture can be divided up into a grid of rectangular tiles that can be decoded independently, with new signaling allowing for multi-threaded decode. This supports a new decoder structure called Wavefront Parallel Processing (WPP). With WPP, the picture is partitioned into rows of treeblocks, which allow decoding and prediction using data from multiple partitions. This picture structure allows parallel decoding of rows of treeblocks, with as many processors as the picture contains treeblock rows. The staggered start of processing looks like a wave front when represented graphically, hence the name.


Four Different Inverse DCT
Transform sizes are specified with HEVC: 4 x 4, 8 x 8, 16 x 16, and 32 x 32. Additionally, 4 x 4 intra-coded Luma blocks are transformed using a new Discrete Sine Transform (DST). Unlike AVC, columns are transformed first, followed by rows, and Coding Units can be hierarchically split (quad tree) all the way down to 4 x 4 regions. This allows encoders to adaptively assign transform blocks that minimize the occurrence of high-frequency coefficients. The availability of different transform types and sizes adds efficiency while reducing blocking artifacts.

A new de-blocking filter, similar to that of AVC, operates only on edges that are on the block grid. Furthermore, all vertical edges of the entire picture are filtered first, followed by the horizontal edges. After the de-blocking filter, HEVC provides two new optional filters: Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF).

In the SAO filter, the entire picture is treated as a hierarchical quad tree. Within each sub-quadrant in the quad tree, the filter can be used by transmitting offset values that can correspond either to the intensity band of pixel values (band offset) or to the difference compared to neighboring pixels (edge offset). ALF is designed to minimize the coding errors of the decoded frame compared to the original one, yielding a much more faithful reproduction.


Advanced Motion Compensation
Motion compensation is provided by two new methods, Advanced Motion Vector Prediction (AMVP) and Merge Mode, both of which use indexed lists of neighboring and temporal predictors. AMVP uses motion vectors from neighboring prediction units, chosen from both spatial and temporal predictors, and Merge Mode uses motion vectors from neighboring blocks as predictors.

To calculate motion vectors, Luma is filtered to quarter-pixel accuracy, using a high- precision 8-tap filter. Chroma is filtered with a one-eighth-pixel 4-tap filter. A motion-compensated region can be either single- or bidirectionally interpolated (one or two motion vectors and reference frames), and each direction can be individually weighted.

The JCT-VC team is also studying various new tools for adaptive quantization. After this last lossy coding step, lossless Context-adaptive Binary Arithmetic Coding (CABAC) is carried out, which is similar to AVC’s CABAC, but has been rearranged to allow for simpler and faster hardware decoding. Currently, the low- complexity entropy-coding technique called Context-Adaptive Variable-Length Coding (CAVLC), which was available as an option in AVC, is not available in HEVC.

With all of these improvements comes a price: Both encoding and decoding are significantly more complex, so we can expect more expensive processors on both sides. On the decoding side, this means a higher density of silicon and/or software, both requiring faster chips, and higher power consumption, but Moore’s Law should help. As for deployment in portable devices, it will be an interesting challenge to realize the efficiency benefits of HEVC in devices that are demanding increasing amounts of video content.

By Aldo Cugnini, Broadcast Engineering

Trifocal Camera Readied for Live 3D

The days of stereo 3D mirror rigs could be numbered if new technology being devised by Arri, the Fraunhofer Institute and Walt Disney Studios comes to fruition.

The trio of companies have just begun a second phase of tests in Berlin on a trifocal camera system that comprises a single Arri M camera sandwiched between two micro HD cameras developed by Fraunhofer, alongside a computer processor.

The dual witness cameras capture enough information on set to be combined into depth maps by Fraunhofer's STAN Stereoscopic Analyzer software (which features in the DVS Clipster post tool), for the post production of live action content in 3D.



The concept would negate the need for cumbersome 3D camera rigs, allow an on-set 3D workflow similar to 2D, and in theory help produce 3D content without the glitches inherent in lens misalignment. Disparity-estimation techniques based on the three captured images should allow a second-eye view to be rendered at a virtual interaxial distance that is defined in post.

“If successful we will go into a third test in April this year and if that is successful it will be used on a major film production,” revealed Kathleen Schroeter, executive manager 3D Innovation Centre, Berlin Fraunhofer Institute.

She said this was currently planned to be a 20-minute short film or a 20-minute sequence within a longer feature, both produced by Disney.

“The current trifocal system is for post production, but the next step is to render the data in real time so that we can produce live broadcast programming without rigs,” she added.

Curiously the initiative has come from Disney in Hollywood, rather than Disney's own research institute in Zurich, which is also exploring ways of creating 3D content using plenoptic lenses and computational cinematography.

By Adrian Pennington, TVB Europe

Mathematical Model Predicts Whether a Movie Will be a "Hit"




By Aki Tsukioka and Takuya Nakajima, DigInfo TV

Deloitte: 4K to Kick Off in 2013

Multiple test broadcasts of 4K services are likely during 2013 with commercial services expected in 2014/15, according to Deloitte. The financial advisor’s Technology, Media & Telecommunications arm is predicting that 4K – offering a resolution four times that of the current high definition standard – will begin to rollout over the next 12 months.

“The cost for broadcasters of creating a 4K channel, factoring in upgrades to existing equipment and infrastructure could be $10 million to $15 million,” says Deloitte. In comparison an HD channel would cost $2 million.

While acknowledging that it may be between 18 and 36 months before 4K is technically and commercially ready, Deloitte believes several landmarks will be reached. These include the availability of 20 4K TV set models from more than 10 vendors, available before the turn of the year. A range of 4K content will be released.

Professional and semi-professional cameras should become available, while a new HDMI interface standard to service 4K data rates is expected to be ratified.

Deloitte admits that the need for a new HD standard at all will be questioned, but says demand for 4K will grow in the medium term, as consumers are expecting higher resolution and ever-larger screens. With pricepoints under $10,000, initial purchases can be expected to be a medium term purchase for the mainstream consumer.

Given that streaming 4K content will initially be challenging, Deloitte says it will be eight-layer Blu-ray with its 200 GB capacity that will be relied on for packaged media delivery.

By Julian Clover, Broadband TV News

ShotOnWhat?

The goal of ShotOnWhat? is to create the largest collection of technical information that exists for Film & Television productions.

If it has been seen on TV or shown in a theater, the website would like to gather as much technical information as possible about the cameras, gear, post, sound, VFX and any other associated elements or processes, notes and trivia from the production.

ShotOnWhat? is making the information searchable, cross referencing it a bit, and creating some trending to observe long term shifts in technology.

DASH264 Implementation Guidelines is Now Available for Public Review

The DASH264 Implementation Guidelines defines an interoperability point for the video distribution over the top. The guidelines cover both live and on-demand services and includes MPEG-DASH profiles, audio and video codecs, close caption format and common encryption constants.

The version 0.9 of DASH264 Implementation Guidelines is not yet final. It is provided for public review until March 15th, 2013. If you have comments on the document, please mail the comments to iop-track@dashpg.org with a detailed description of the problem and the comment. Based on the received comments a final document will be published by March 31st, 2013.

For this version of guidelines, the client is expected to support:

  • presentation of high-definition video up to 720p (based on H.264/AVC Progressive High Profile)
  • presentation of stereo audio
  • support of basic subtitles
  • basic support for encryption/DRM
In addition, it is recognized that certain clients may only be capable to operate with H.264/AVC Main Profile. Therefore content authors may provide and signal a specific subset of DASH264 by providing a specific profile identifier referring to a standard definition presentation.

Source: MPEG DASH Industry Forum

Broadcast Delivery 101

Broadcast Delivery 101 is a book created for the post production community from around the world, this tool will allow you to understand the different requirements for Television Commercial Delivery.

Created by Craig Russill-Roy a 25 year professional in the industry and offers a window into the world of digital delivery, video codecs, audio loudness and most importantly tutorials.

This 117 page book is a great tool for anyone that is exporting commercials for not only local adaptation but international requirements as well.

This book is available for download on your iPad with iBooks or on your computer with iTunes.

The Master Guide to Rigging a Blackmagic Design Cinema Camera

A free guide about BMCC.

Microsoft: Why MPEG DASH Will See Broad Adoption in 2013




Source: StreamingMedia

Will Gesture Recognition Reach its Potential?

In the run up to the 2013 Consumer Electronics Show in Las Vegas, I find an increasing level of interest in a variety of gesture recognition technologies. Leap Motion, eyeSight, and Tobii Technology are among the firms planning to use CES to showcase their latest developments in non-touch gesture recognition technology.

Leap has announced that they have secured $30 million in Series B funding and their first global OEM partnership with personal computer maker ASUS. Leap Motion’s non-contact motion controller technology can track hand and finger movements to 1/100th millimeter. The Leap Motion controller has a 150-degree field of view, and tracks individual hands and all 10 fingers at 290 frames per second.



Since announcing their motion controller technology in May 2012, Leap has delivered 12,000 Leap Motion devices and software development kits to developers to encourage adoption of their technology. Developers have been quick to utilize the Leap Motion controllers to realize new gesture input applications.

Developer Adam Somers has posted a video of his AirHarp application that illustrates the responsive nature of the Leap Motion controller.



LabVIEW developers have also been busy creating Leap Motion applications including a quadrotor flight controller.



Also at CES, eyeSight and Tobii will be launching gesture recognition technologies involving hand gestures (eyeSight) and eye tracking and gaze interaction (Tobii).

eyeSight’s offering is a software-based technology that utilizes camera hardware, as present in many of today’s devices including tablets and smartphones, to enable robust gesture recognition using only a standard 2D camera.  eyeSight claims that their solution is appropriate for handheld devices used at close range as well as longer range applications such as televisions and set top boxes.



Tobii Technology is announcing at CES 2013 their Tobii REX gaze interaction peripheral for the consumer electronics market.  The Tobii REX device enables users to control their computer by combining their eye gaze with other controls including touch screen, mouse and keyboard.

The Tobii REX peripheral adds eye gaze control features to any Windows 8 PC. The Tobii REX adheres to the bottom of a desktop monitor and is connected to the computer via a USB connection. At CES Tobii will demonstrate several aspects of their Gaze platform including: Gaze Select, Gaze Scroll, Gaze Zoom and Gaze Navigate.



By Phil Wright, Display Central

Field of View Comparator

AbeCine's field of view comparator.

35mm Digital Sensors




PDF version of the chart.

By Mitch Gross: AbelCine

Depth of Field




Source: Vimeo

Focal Length




Source: Vimeo

An Intro to Lenses




Aperture
The aperture is the diameter of the lens opening. The larger the diameter of the aperture, the more light reaches the film or image sensor. The aperture also performs a critical function for focus. As the aperture decreases in size, the background and foreground gain sharpness. This zone of sharpness is called the depth of field.

Aperture is expressed as F-stop and will be indicated on your camera in abbreviations that look like this: F2.8 or f/2.8. The "F" stands for the focal length of your lens, and the number indicates the diameter of the iris opening.



Focal Length
16mm: An ultra wide lens, this bad boy distorts heavily, emphasizing objects in the foreground by making them look a lot larger than the background. Dynamic, but use with caution!

28mm: Standard for documentary and photojournalism to shoot cowboy shots, otherwise known as medium shots.

35mm: Another standard for documentary filming, also tight enough to shoot portraits.

50mm: Standard for cinema/video, it approximates the human eye's typical focal length.

85mm: A popular portrait, or "beauty" lens. Capable of making everyone look lovely!

200mm: The top of the scale for most people, this is a telephoto lens. Their inherent shallow depth of field makes them useful in eliminating unwanted foreground and background objects by simply throwing them out of focus. Great for sports photography!

Source: Vimeo