Video Compression Technology: HTML5

A media container is a “wrapper” that contains video, audio and data elements, and can function as a file entity or as an encapsulation method for a live stream. Because container formats are now starting to appear in broadcast situations, both OTA and online, it is useful to consider the various ways that compressed video (and audio) are carried therein, both by RF transmission and by the Internet.

Web Browsing and Broadcasting Crossing Over
The ubiquitous Web browser is a tool that users have come to rely on for accessing the Internet. Broadcasters already make use of this for their online presence, by authoring content and repurposing content specifically for Internet consumption. But browsing capability is something that will come to OTA broadcast as well, once features like non-real-time (NRT) content distribution become implemented. For example, by using the ATSC NRT specification, now under development, television receivers can be built that support different compression formats for cached video, including AVC video and MP3 audio, and different container file formats, such as the MP4 Multimedia Container Format. It is envisioned that these receivers will have the capability of acting as integrated live-and-cached content managers, and this will invariably involve support for different containers and codecs. For this reason, we need to understand how browsers and containers — two seemingly different technologies — are related in the way they handle content.

Several container formats currently provide encapsulation for video and audio, including MPEG Transport Stream, Microsoft Advanced Systems Format (ASF), Audio Video Interleave (AVI) and Apple QuickTime. While not a container format per se, the new HTML5 language for browsers nonetheless has the capability of “encapsulating” video and audio for presentation to a user. With the older HTML, there was no convention for playing video and audio on a webpage; most video and audio have been played by the use of plug-ins, which integrate the video with the browser. However, not all browsers support the same plug-ins. HTML5 changes that by specifying a standard way to include video and audio, with video and audio “elements.”

HTML5 is a new specification under development to replace the existing HTML used by Web browsers to present content since 1999. Among the key requirements of HTML5 are that it be device-independent and that it should reduce the need for external plug-ins. Some of the new features in HTML5 include functions for embedding and controlling video and audio, graphics and interactive documents. For example, a “canvas” element using JavaScript allows for dynamic, scriptable rendering of precise 2-D shapes (paths, boxes, circles, etc.) and bitmap images. Other content-specific elements provide more control over text and graphics formatting and placement, much like a word processor, and new form controls support the use of calendars, clocks, e-mail and searching. Most modern browsers already support some of these features.

The HTML5 Working Group includes AOL, Apple, Google, IBM, Microsoft, Mozilla, Nokia, Opera and many other vendors. This working group has garnered support for including multiple video codecs (and container formats) within the specification, such as OGG Theora, Google's VP8 and H.264. However, there is currently no default video codec defined for HTML5. Ideally, the working group thinks that a default video format should have good compression, good image quality and a low processor load when decoding; they would like it to be royalty-free as well.

Multiple Codecs Present Complex Choices
HTML5 thus presents a potential solution to manufacturers and content providers that want to avoid licensed codecs such as Adobe Flash (FLV), while preferring the partially license-free H.264 (i.e., for Internet Broadcast AVC Video), and fully license-free VP8, Theora and other codecs. Flash, which has become popular on the Internet, most often contains video encoded using H.264, Sorenson Spark or On2's VP6 compression. The licensing agent MPEG-LA does not charge royalties for H.264 video delivered to the Internet without charge, but companies that develop products and services that encode and decode H.264 video do pay royalties. Adobe nonetheless provides a free license to the Flash Player decoder.

HTML5 can be thought of as HTML plus Cascading Style Sheets (CSS) plus JavaScript. CSS is a language for describing the presentation of webpages, including colors, layout and fonts. This allows authors to adapt the presentation to different types of devices, such as large screens vs. small screens. Thus, content authored with HTML5 can serve as a “raw template,” and repurposing to different devices entails generating appropriate CSS for each device. (This is known to programmers as separating “structure” from “presentation.”) JavaScript is an implementation of ECMAScript, both of which are scripting languages that allow algorithms to be run on-the-fly in decoders. Because JavaScript code runs locally in a user's browser, the browser can respond to user input quickly, making interaction with an application highly responsive.

Websites often use some form of detection to determine if the user's browser is capable of rendering and using all of the features of the HTML language. Because there is no specific “flag” that indicates browser support of HTML5, JavaScript can be used to check the browser for its functionality and support of specific HTML features. When such a script runs, it can create a global object that is stored locally and can be referenced to determine the supported local features. This way, the content being downloaded can “adapt” itself to the capabilities of different browsers (and decoder hardware). Scripts are not always needed for detection, however. For example, HTML code can be written, without the use of JavaScript, that embeds video into a website using the HTML5 “video” element, falling back to Flash automatically.

HTML5 also provides better support for local offline storage, with two new objects for storing user-associated data on the client (the playback hardware/software): localStorage, which stores data with no time limit, and sessionStorage, which stores data for one session. In the past, personalization data was stored using cookies. However, cookies are not suitable for handling large amounts of data because they are sent to the server every time there is an information request (such as a browser refresh or link access), which makes the operation slow and inefficient. With HTML5, the stored object data is transferred only when a server or client application needs it. Thus, it is possible to store large amounts of data locally without affecting browsing performance. In order to control the exchange of data, especially between different websites, a website can only access data stored by itself. HTML5 uses JavaScript to store and access the data.

Years ago, content developers predicted the crossover of television and Internet. With standard codecs, container formats and specifications like HTML5, integration of the two media will soon be common.

By Aldo Cugnini, Broadcast Engineering