The Trials and Tribulations of HTML Video in the Post-Flash Era

Adobe reversed course on its Flash strategy after a recent round of layoffs and restructuring, concluding that HTML5 is the future of rich Internet content on mobile devices. Adobe now says it doesn’t intend to develop new mobile ports of its Flash player browser plugin, though existing implementations will continue to be maintained.

Adobe’s withdrawal from the mobile browser space means that HTML5 is now the path forward for developers who want to reach everyone and deliver an experience that works across all screens. The strengths and limitations of existing standards will now have significant implications for content creators who want to deliver video content on the post-flash Web.

Apple’s decision to block third-party browser plugins like Flash on its iOS devices played a major role in compelling Web developers to build standards-based fallbacks for their existing Flash content. This trend will be strengthened when Microsoft launches Windows 8 with a version of Internet Explorer that doesn’t support plugins in the platform’s new standard Metro environment.

Flash still has a significant presence on the Internet, but it's arguably a legacy technology that will decline in relevance as mobile experiences become increasingly important. The faster pace of development and shorter release cycles in the browser market will allow open standards to mature faster and gain critical mass more quickly than before. In an environment where standards-based technologies are competitive for providing rich experiences, proprietary vendor-specific plugins like Flash will be relegated to playing a niche role.

Our use of the phrase “post-Flash” isn’t intended to mean that Flash is dead or going to die soon. We simply mean that it’s no longer essential to experiencing the full Web. The HTML5 fallback experiences on many Flash-heavy sites still don’t provide feature parity with the Flash versions, but the gap is arguably shrinking—and will continue to shrink even more rapidly in the future.

Strengths and Weaknesses of HTML5 Video
HTML5 has much to offer for video delivery, as the HTML5 video element seamlessly meshes with the rest of the page DOM and is easy to manipulate through JavaScript. This means that HTML5 video offers significantly better native integration with page content than it has ever been possible to achieve with Flash. The open and inclusive nature of the standards process will also make it possible for additional parties to contribute to expanding the feature set.

A single company no longer dictates what can be achieved with video, and your video content is no longer isolated to a rectangle embedded in a page. HTML5 breaks down the barriers between video content and the rest of the Web, opening the door for more innovation in content presentation. Three are some really compelling demonstrations out there that showcase the use of video in conjunction with WebGL and other modern Web standards. For example, the video shader demo from the 3 Dreams of Black interactive film gives you a taste of what’s possible.

Click to watch the video

Of course, transitioning video delivery in the browser from Flash to HTML5 will also pose some major challenges for content creators. The standards aren’t fully mature yet and there are still a number of features that aren’t supported or widely available across browsers.

For an illustration of how deep the problems run, you need only look at Mozilla’s Firefox Live promotional website, which touts the organization’s commitment to the open Web and shows live streaming videos of Red Panda cubs from the Knoxville Zoo. The video is streamed with Flash instead of using standards-based open Web technologies.

In an FAQ attached to the site, Mozilla says that it simply couldn’t find a high-volume live streaming solution based on open codecs and open standards. If Mozilla can’t figure out how to stream its cuddly mascot with open standards, it means there is still work to do.

Flash is required to see the Red Panda cubs on Mozilla's website

Two of the major technical issues faced by HTML5 video adopters are the lack of adequate support for adaptive streaming and the lack of consensus surrounding codecs. There is currently an impasse between backers of the popular H.264 codec and Google’s royalty-free VP8 codec. There’s no question that a royalty-free video format is ideal for the Web, but the matter of whether VP8 is truly unencumbered by patents—and also meets the rest of the industry’s technical requirements—is still in dispute.

There is another major issue that hasn’t been addressed yet by open Web standards that could prove even more challenging: content protection. The vast majority of Flash video content on the Internet doesn’t use any kind of DRM and is trivially easy to download. Flash does, however, provide DRM capabilities and there are major video sites that rely on that technology in order to protect the content they distribute.

Can DRM Be Made to Play Nice with Open Standards?
DRM is almost always bad for regular end users and its desirability is highly debatable, but browser vendors will have to support it in some capacity in order to make HTML5 video a success. Many of the content creators who license video material to companies like Netflix and Hulu contractually stipulate a certain degree of content protection.

Mozilla’s Robert O’Callahan raised the issue of HTML5 video DRM in a recent blog entry shortly after Adobe’s announcement regarding mobile Flash. He expressed some concern that browser vendors will look for a solution that is expedient rather than inclusive, to the detriment of the open Web.

“The problem is that some big content providers insist on onerous DRM that necessarily violates some of our open Web principles (such as Web content being equally usable on any platform, based on royalty-free standards, and those standards being implementable in free software),” O'Callahan wrote. “We will probably get into a situation where Web video distributors will be desperate for an in-browser strong DRM solution ASAP, and most browser vendors (who don’t care all that much about those principles) will step up to give them whatever they want, leaving Mozilla in another difficult position. I wish I could see a reasonable solution, but right now I can’t. It seems even harder than the codec problem.”

O'Callahan also pointed out in his blog entry that the upcoming release of Windows 8, which will not support browser plugins in its Metro environment, means that the lack of DRM support in standards-based Web video is no longer just a theoretical concern. Microsoft may need to furnish a solution soon, or risk frustrating users who want to watch commercial video content on the Web in Windows 8 without installing additional apps or leaving the Metro shell.

Netflix Stands Behind DASH
Flash evangelists may feel that the limitations of HTML5 video and the problems that content creators are sure to face during the transition are a vindication of the proprietary plugin model. But the advantages of a truly open, vendor-neutral, and standards-based video solution that can span every screen really dwarf the challenges. That is why major stakeholders are going to be willing to gather around the table to try find a way to make it work.

Netflix already uses HTML5 to build the user interfaces of some of its embedded applications, including the one on the PS3. The company has soundly praised the strengths of a standards-based Web technology stack and has found that there are many advantages. But the DRM issue and the lack of suitably robust support for adaptive streaming have prevented Netflix from dropping its Silverlight-based player in regular Web browsers.

The company has committed to participating in the effort to make HTML5 a viable choice for all video streaming. Netflix believes that the new Dynamic Adaptive Streaming over HTTP (DASH) standard being devised by the Motion Picture Experts Group (MPEG) will address many of the existing challenges and pave the way for ubiquitous adoption of HTML5 for streaming Internet video.

DASH, which is expected to be ratified as an official standard soon, has critical buy-in from many key industry players besides Netflix, including Microsoft and Apple. An early DASH playback implementations is already available as a plugin for the popular VLC video application.

The DASH standard makes video streaming practical over HTTP and addresses the many technical requirements of high-volume streaming companies like Netflix, but it doesn’t directly address the issue of DRM by itself. DASH can be implemented in a manner that is conducive to supporting DRM, however.

Ericsson Research, which is involved in the DASH standardization effort, has done some worthwhile preliminary research to evaluate the viability of DRM on DASH. Ericsson produced a proof-of-concept implementation that uses DRM based on the Marlin rights management framework.

Marlin, which was originally created by a coalition of consumer electronics vendors, is relatively open compared to alternate DRM technologies and makes use of many existing open standards. But Marlin is still fundamentally DRM and suffers from many of the same drawbacks, and adopters have to obtain a license from the Marlin Trust Management Organization, which holds the keys.

The architecture of the Marlin rights management framework

Ericsson explains in its research that it chose to experiment with Marlin for their proof-of-concept implementation because it’s available and mature—other similar DRM schemes could also easily be adopted. Existing mainstream DRM schemes would all likely pose the same challenges, however, and it’s unlikely that such solutions will be viewed as acceptable by Mozilla. More significantly, an implementation of HTML5 video that relies on this kind of DRM would undermine some of the key values and advantages of openness that are intrinsic to the open Web.

The ease with which solutions like Marlin can be implemented on top of HTML5 will create pressure for mainstream browser vendors to adopt them quickly. This could result in the same kind of fragmentation that exists today surrounding codecs. As O’Callahan said, it’s easy to see this issue becoming far more contentious and challenging to overcome than the codec issue.

What Next?
The transition to HTML5 and standards-based technology for video delivery will bring many advantages to the Web. There are some great examples that show what can be achieved when developers really capitalize on the strengths of the whole open Web stack. The inclusiveness of the standards process will also give a voice to additional contributors who want to expand the scope of what can be achieved with video on the Web.

There are still some major obstacles that must be overcome in order for the profound potential of standards-based Web video to be fully realized in the post-Flash era. Open standards still don’t deliver all of the functionality that content creators and distributors will require in order to drop their existing dependence on proprietary plugins. Supplying acceptable content protection mechanisms will prove to be a particularly bitter challenge.

Despite the barriers ahead, major video companies like Netflix recognize the significant advantages of HTML5 and are willing to collaborate with other stakeholders to make HTML5 video a success. The big question that remains unanswered is whether that goal can be achieved without compromising the critically important values of the open Web.

By Ryan Paul, Ars Technica

IMF for a Multi-Platform World

Among other things, the looming arrival of the Interoperable Master Format (IMF) is illustrating that the digital media industry is now capable of moving "nimbly and quickly" to create technical standards to address and evolve the ways that it packages, moves, and protects precious content in the form of digital assets in a world where the technology used to do all that, and the very industry itself, is fundamentally changing at a startling rate. The term "nimbly and quickly" comes from Annie Chang, Disney's VP of Post-Production Technology who also chairs the SMPTE IMF work group (TC-35PM50).

Six Hollywood Studios through the University of Southern California Entertainment Technology Center (USC ETC) started to develop IMF in 2007, and in early 2011, they created an input document that the SMPTE IMF working group is now using as the basis of the IMF standardization effort. Over time, IMF has developed into an interchangeable, flexible master file format designed to allow content creators to efficiently disseminate a project's single master file to distributors and broadcasters across the globe.

Chang reports that progress has moved quickly enough for the work group to expect to finalize a basic version of the IMF standard in coming months, with draft documents possibly ready by early 2012 that focus on a core framework for IMF, and possibly a few of the potential modular applications that could plug into that framework.

Once that happens, content creators who have prepared for IMF will be in a position to start feeding all their distributors downstream far more effectively than has been the case until now in this world of seemingly unending formats. They will, according to Chang, be able to remove videotape from their production workflow, reduce file storage by eliminating the need for full linear versions of each edit or foreign language version of their content, and yet be able to take advantage of a true file-based workflow, including potentially automated transcoding, and much more.

The rollout will still need to be deliberate as various questions and unanticipated consequences and potential new uses of IMF begin to unfold. But that said, Chang emphasizes that the goal of being able to streamline and improve the head end of the process—creating a single, high quality, ultimate master for all versions is real and viable, and with a little more work and input, will be happening soon enough.

"Today, we have multiple versions, different resolutions, different languages, different frame rates, different kinds of HD versions, standard-definition versions, different aspect ratios—it's an asset management nightmare," she says, explaining why getting IMF right is so important to the industry.

"Everyone creates master files on tape or DPX frames or ProRes or others, and then they have to create mezzanine files in different formats for each distribution channel. IMF is designed to fix the first problem—the issue of too many file formats to use as masters."

Therefore, IMF stands to be a major boon for content creators who repeatedly and endlessly create different language versions of their material.

"For a ProRes QuickTime, you are talking about a full res version of a movie each time you have it in a new language," Chang says. "So 42 languages would be 42 QuickTime files. IMF is a standardized file solution built on existing standards that will allow people to just add the new language or whatever other changes they need to make to the existing master and send it down the chain more efficiently."

Chang emphasizes the word "flexible" in describing IMF, and the word "interoperable" in the name itself because, at its core, IMF allows content distributors to uniformly send everybody anything that is common, while strategically transmitting the differences only to where they need to go. In that sense, IMF is based on the same architectural concept as the Digital Cinema specification—common material wrapped together, combined with a streamlined way to package and distribute supplemental material. Eventually, each delivery will include an Output Profile List (OPL) to allow those transcoding on the other end a seamless way to configure the file as they format and push it into their distribution chain.

Unlike the DCI spec, however, IMF is not built of wholly new parts. Wherever possible, the file package will consist of existing pieces combined together in an MXF-flavored wrapper. This should, Chang hopes, make it easier for businesses across the industry to adapt without huge infrastructure changes in most cases as IMF comes to fruition.

"With IMF, we are using existing standards—a form of MXF (called MXF OP1A/AS-02) to wrap the files, and parts of the Digital Cinema format and other formats that many manufacturers already use," she says. "So, hopefully, there is not much of a learning curve. We hope that most of the big companies involved in the process won't be caught unaware, and will be able to make firmware or software upgrades to their systems in order to support IMF. Hopefully, companies will not have to buy all new equipment in order to use IMF.

"And with the concept of the Output Profile List (OPL), which essentially will be global instructions on output preferences for how to take an IMF file and do something with it relative to your particular distribution platform, companies that are doing transcoding right now will have an opportunity to use that to their advantage to better automate their processes. IMF has all the pieces of an asset management system and can use them all together to create standardized ways to create packages that fit into all sorts of other profiles. It's up to the content owners to take these OPL's and transcode the files. As they do now, they could do it in-house or take it to a facility. But if transcoding facilities get smart and use IMF to its potential, they can take advantage of IMF's capabilities to streamline their processes."

Chang says major technology manufacturers have been extremely supportive of the SMPTE standardization effort. Several, such as Avid, Harmonic, DVS, Amberfin, and others have actively participated and given input on the process, which is important because changes to existing editing, transcoding, and playback hardware and software, and the eventual creation of new tools for those tasks, will eventually need to happen as IMF proliferates. After all, as Chang says, "what good is a standard unless people use it?"

She emphasizes that manufacturer support is crucial for IMF, since it is meant to be a business-to-business tool for managing and distributing content, and not a standard for how consumers will view content. Therefore, outside of the SMPTE standardization effort, there is a plan to have manufacturers across the globe join in so-called "Plugfests" in 2012 to create IMF content out of draft documents, interchange them with each-other, and report on their findings.

As Chang suggests, "it's important to hit IMF from multiple directions since, after all, the first word in the name is 'interoperable.' " As a consequence of all these developments, it's reasonable to assume that IMF will officially be part of the industry's typical workflow chain where content distributors can start sending material to all sorts of platforms in the next year. Some studios and networks are already overhauling their infrastructures and workflow approaches to account for IMF's insertion into the process, and encoding houses and other post-production facilities should also, in most cases, have the information and infrastructure to adapt to the IMF world without any sort of fundamental shift. But the post industry will be somewhat changed by IMF, especially if some facilities or studios decide on processes for automating encoding at the front end of the process that changes their reliance on certain facilities currently doing that kind of work.

However, Chang adds, the broadcast industry specifically will probably have the most significant learning curve in terms of how best to dive into IMF since, unlike studios, which have been discussing their needs and pondering IMF since about 2006, the broadcast industry was only exposed more directly to IMF earlier this year when SMPTE took the process over. IMF was originally designed and intended as a higher bit-rate master (around 150-500MB/s for HD, possibly even lossless, according to Chang), but broadcasters normally use lower bit-rate files (more like 15-50MB/s).

"However, I feel that broadcasters would like to have that flexibility in versioning," Chang says. "But because they need different codecs and lower bit-rates, there is still discussion in SMPTE about what those codecs should be. Broadcasters are only now starting to evaluate what they need out of IMF, but there is still plenty of time for them to get involved."

Of course, as the explosion of mobile devices and web-enabled broadcasting on all sorts of platforms in a relatively short period of time illustrates, viewing platforms will inevitably change over time, and therefore, distribution of data will have to evolve, as well. As to the issue of whether IMF is relatively future-proofed, or merely the start of a continually evolving conversation, Chang is confident the standard can be in place for a long time because of its core framework—the primary focus to date. That framework contains composition playlists, general image data, audio data (unlimited tracks, up to 16 channels each), sub-titling/captioning data, any dynamic metadata needed, and so on.

Modular applications that could plug into that framework need to be further explored, Chang says, but the potential to allow IMF to accommodate new, higher compressed codecs, new or odd resolutions or frame rates, and all sorts of unique data for particular versions is quite high.

"The core framework we created with documents is something we tried to future proof," she says. "The question is the applications that might plug into that core framework (over time). We are trying to make it as flexible as possible so that if, in the future, even if you have some crazy new image codec that goes up to 16k or uses a new audio scheme, it will still plug into the IMF framework. So image, audio, or sub-titling could be constrained, for example, but as long as the sequence can be described by the composition playlist and the essence can be wrapped in the MFX Generic Container, the core framework should hold up for some time to come."

To connect with the SMPTE IMF effort, you can join the SMPTE 35PM Technology Committee, and then sign up as a member of 35PM50. The IMF Format Forum will have the latest news and discussions about the status of the IMF specification.

More information about IMF:

By Michael Goldman, SMPTE Newswatch

A VLC Media Player Plugin Enabling DASH

This poster describes an implementation of the emerging Dynamic Adaptive Streaming over HTTP (DASH) standard which is currently developed within MPEG and 3GPP. Our implementation is based on VLC and fully integrated into its structure as a plugin.

Furthermore, our implementation provides a powerful extension mechanism with respect to adaptation logics and profiles. That is, it should be very easy to implement various adaptation logics and profiles.

Future versions of the plugin will provide an update to the latest version of the standard (i.e., a lot of changes have been adopted recently, e.g., Group has changed to AdaptationSet), add support for persistent HTTP connections in order to reduce the overhead of HTTP streaming (e.g., compared to RTP), and seeking within a DASH stream.

By Christopher Mueller and Christian Timmerer, Alpen-Adria-Universitaet Klagenfurt


DASHEncoder generates the desired representations (quality/bitrate levels), fragmented MP4 files, and MPD file based on a given config file or by command line parameters respectively. Given the set of parameters the user has a wide range of possibilities for the content generation, including the variation of the segment size, bitrate, resolution, encoding settings, URL , etc.

The DASHEncoder steps are depicted here:

Current features and restrictions:
  • Generation of video only, audio only or audio+video DASH content.
  • H.264 encoding based on x264: Constant and variable bitrate encoding.
  • Supported profile: urn:mpeg:dash:profile:isoff-main:2011.
  • PSNR logging and MySQL interface for storing in a database (only for common resoltution representations).
  • There are currently problems with the playback of the content containing Audio with the DASH VLC plugin.

The DASHEncoder is available as open source with the aim that other developers will join this project.

Source: Alpen-Adria Universität Klagenfurt via Video Breakthroughs

Energy Efficient and Robust S3D Video Distribution Enabled with Nomad3D CODEC and 60 GHz Link

This white paper describes a 3D Video Distribution scheme using the new wireless 60 GHz standard for connectivity and Nomad3D 3D+F 3D CODEC. It will be shown that a specially dedicated Video Delivery System using 60 GHz Modems and the 3D+F CODEC is very efficient in overall system power consumption and more robust to channel impairments.

Source: Nomad3D

Getting Machines to Watch 3D for You

The advantages of automatic monitoring of multiple television channels are well known. There are just not enough eyeballs for human operators to see what is going on. With the advent of stereoscopic 3D in mainstream television production and distribution, the benefits of automatic monitoring are even greater, as 3D viewing is even less conducive to manual monitoring.

This paper, presented at IBC 2011, gives a comprehensive introduction to a wide range of automatic monitoring possibilities for 3D video. There are significant algorithmic challenges involved in some of these tasks, often involving careful high-level analysis of picture content. Further challenges arise from the need for monitoring to be robust to typical processing undergone by video signals in a broadcast chain.

By Mike Knee, Consultant Engineer, R&D Algorithms Team, Snell

MPEG Analysis and Measurement

Broadcast engineering requires a unique set of skills and talents. Some audio engineers claim the ability to hear the difference between tiny nuisances such as different kinds of speaker wire. They are known as those with golden ears. Their video engineering counterparts can spot and obsess over a single deviate pixel during a Super Bowl touchdown pass or a “Leave it to Beaver” rerun in real time. They are known as eagle eyes or video experts.

Not all audio and video engineers are blessed with super-senses. Nor do we all have the talent to focus our brain’s undivided processing power to discover and discern vague, cryptic and sometimes immeasurable sound or image anomalies with our bare eyes or ears on the fly, me included. Sometimes, the message can overpower the media. Fortunately for us and thanks to the Internet and digital video, more objective quality and measurement standards and tools have developed.

One of those standards is Perceptual Evaluation of Video Quality (PEVQ). It is an End-to-End (E2E) measurement algorithm standard that grades picture quality of a video presentation by a five-point Mean Opinion Score (MOS), one being bad and five being excellent.

PEVQ can be used to analyze visible artifacts caused by digital video encoding/decoding or transcoding processes, RF- or IP-based transmission systems and viewer devices like set-top boxes. PEVQ is suited for next-generation networking and mobile services and include SD and HD IPTV, streaming video, mobile TV, video conferencing and video messaging.

The development for PEVQ began with still images. Evaluation models were later expanded to include motion video. PEVQ can be used to assess degradations of a decoded video stream from the network, such as that received by a TV set-top box, in comparison to the original reference picture as broadcast from the studio. This evaluation model is referred to as End-to-End (E2E) quality testing.

E2E exactly replicates how so-called average viewers would evaluate the video quality based on subjective comparison, so it addresses Quality-of-Experience (QoE) testing. PEVQ is based on modeling human visual behaviors. It is a full-reference algorithm that analyzes the picture pixel-by-pixel after a temporal alignment of corresponding frames of reference and test signal.

Besides an overall quality Mean Opinion Score figure of merit, abnormalities in the video signal are quantified by several Key Performance Indicators (KPI), such as Peak Signal-to-Noise Ratios (PSNR), distortion indicators and lip-sync delay.

PVEQ References
Depending on the data made available to the algorithm, video quality test algorithms can be divided into three categories based on available reference data.

A Full Reference (FR) algorithm has access to and makes use of the original reference sequence for a comparative difference analysis. It compares each pixel of the reference sequence to each corresponding pixel of the received sequence. FR measurements deliver the highest accuracy and repeatability but are processing intensive.

A Reduced Reference (RR) algorithm uses a reduced bandwidth side channel between the sender and the receiver, which is not capable of transmitting the full reference signal. Instead, parameters are extracted at the sending side, which help predict the quality at the receiving end. RR measurements are less accurate than FR and represent a working compromise if bandwidth for the reference signal is limited.

A No Reference (NR) algorithm only uses the degraded signal for the quality estimation and has no information of the original reference sequence. NR algorithms are low accuracy estimates only, because the original quality of the source reference is unknown. A common variant at the upper end of NR algorithms analyzes the stream at the packet level, but not the decoded video at the pixel level. The measurement is consequently limited to a transport stream analysis.

Another widely used MOS algorithm is VQmon. This algorithm was recently updated to VQmon for Streaming Video. It performs real-time analysis of video streamed using the key Adobe, Apple and Microsoft streaming protocols, analyzes video quality and buffering performance and reports detailed performance and QoE metrics. It uses packet/frame-based zero reference, with fast performance that enables real-time analysis on the impact that loss of I, B and P frames has on the content, both encrypted and unencrypted.

The 411 on MDI
The Media Delivery Index (MDI) measurement is specifically designed to monitor networks that are sensitive to arrival time and packet loss such as MPEG-2 video streams, and is described by the Internet Engineering Task Force document RFC 4445. It measures key video network performance metrics, including jitter, nominal flow rate deviations and instant data loss events for a particular stream.

MDI provides information to detect virtually all network-related impairments for streaming video, and it enables the measurement of jitter on fixed and variable bit-rate IP streams. MDI is typically shown as the ratio of the Delay Factor (DF) to the Media Loss Rate (MLR), i.e. DF:MLR.

DF is the number of milliseconds of streaming data that buffers must handle to eliminate jitter, something like a time-base corrector once did for baseband video. It is determined by first calculating the MDI virtual buffer depth of each packet as it arrives. In video streams, this value is sometimes called the Instantaneous Flow Rate (IFR). When calculating DF, it is known as DELTA.

To determine DF, DELTA is monitored to identify maximum and minimum virtual depths over time. Usually one or two seconds is enough time. The difference between maximum and minimum DELTA divided by the stream rate reveals the DF. In video streams, the difference is sometimes called the Instantaneous Flow Rate Deviation (IFRD). DF values less than 50ms are usually considered acceptable. An excellent white paper with much more detail on MDI is available from Agilent.

Figure 1 - The Delay Factor (DF) dictates buffer size needed to eliminate jitter

Using the formula in Figure 1, let’s say a 3.Mb/s MPEG video stream observed over a one-second interval feeds a maximum data rate into a virtual buffer of 3.005Mb and a low of 2.995Mb. The difference is the DF, which in this case is 10Kb. DF divided by the stream rate reveals the buffer requirements. In this case, 10K divided by 3.Mb/s is 3.333 milliseconds. Thus, to avoid packet loss in the presence of the known jitter, the receiver’s buffer must be 15kb, which at a 3Mb rate injects 4 milliseconds of delay. A device with an MDI rating of 4:0.003, for example, would indicate that the device has a 4 millisecond DF and a MLR of 0.003 media packets per second.

The MLR formula in Figure 2 is computed by dividing the number of lost or out-of-order media packets by observed time in seconds. Out-of-order packets are crucial because many devices don’t reorder packets before handing them to the decoder. The best-case MLR is zero. The minimum acceptable MLR for HDTV is generally considered to be less than 0.0005. An MLR greater than zero adds time for viewing devices to lock into the higher MLR, which slows channel surfing an can introduce various ongoing anomalies when locked in.

Figure 2 - The Media Loss Rate (MLR) is used in the Media Delivery Index (MDI)

Watch That Jitter
Just as too much coffee can make you jittery, heavy traffic can make a network jittery, and jitter is a major source of video-related IP problems. Pro-actively monitoring jitter can alert you to help avert impending QoE issues before they occur.

One way to overload a MPEG-2 stream is with excessive bursts. Packet bursts can cause a network-level or a set-top box buffer to overflow or under-run, resulting in lost packets or empty buffers, which cause macro blocking or black/freeze frame conditions, respectively. An overload of metadata such as video content PIDs can contribute to this problem.

Probing a streaming media network at various nodes and under different load conditions makes it possible to isolate and identify devices or bottlenecks that introduce significant jitter or packet loss to the transport stream. Deviations from nominal jitter or data loss benchmarks are indicative of an imminent or ongoing fault condition.

QoE is one of many subjective measurements used to determine how well a broadcaster’s signal, whether on-air, online or on-demand, satisfies the viewer’s perception of the sights and sounds as they are reproduced at his or her location. I can’t help but find some humor in the idea that the ones-and-zeros of a digital video stream can be rated on a gray scale of 1-5 for quality.

Experienced broadcast engineers know the so-called quality of a digital image begins well before the light enters lens, and with apologies to our friends in the broadcast camera lens business, the image is pre-distorted to some degree within the optical system before the photons hit the image sensors.

QoE or RST?
A scale of 1-5 is what ham radio operators have used for 100 years in the readability part of the Readability, Strength and Tone (RST) code system. While signal strength (S) could be objectively measured with an S-meter such as shown in Figure 3, readability (R) was purely subjective, and tone (T) could be subjective, objective or both.

Figure 3 - The S-meter was the first commonly used metric to objectively
read and report signal strength at an RF receive site

Engineers and hams know that as S and or T diminish, R follows, but that minimum acceptable RST values depend almost entirely on the minimum R figure the viewer or listener is willing to accept. In analog times, the minimum acceptable R figure often varied with the value of the message.

Digital technology and transport removes the viewer or listener’s subjective reception opinion from the loop. Digital video and audio is either as perfect as the originator intended or practically useless. We don’t need a committee to tell us that. It seems to me the digital cliff falls just south of a 4x5x8 RST. Your opinion may vary.

By Ned Soseman, Broadcast Engineering

What is MPEG DASH?

MPEG DASH (Dynamic Adaptive Streaming over HTTP) is a developing ISO Standard (ISO/IEC 23009-1) that should be finalized by early 2012. As the name suggests, DASH is a standard for adaptive streaming over HTTP that has the potential to replace existing proprietary technologies like Microsoft Smooth Streaming, Adobe HTTP Dynamic Streaming (HDS), and Apple HTTP Live Streaming (HLS). A unified standard would be a boon to content publishers, who could produce one set of files that play on all DASH-compatible devices.

The DASH working group has industry support from a range of companies, with contributors including critical stakeholders like Apple, Adobe, Microsoft, Netflix, Qualcomm, and many others. However, while Microsoft has indicated that it will likely support the standard as soon as it’s finalized, Adobe and Apple have not given the same guidance, and until DASH is supported by these two major players, it will gain little traction in the market.

A more serious problem is that MPEG DASH doesn’t resolve the HTML5 codec issue. That is, DASH is codec agnostic, which means that it can be implemented in either H.264 or WebM. Since neither codec is universally supported by all HTML5 browsers, this may mean that DASH users will have to create multiple streams using multiple codecs, jacking up encoding, storage, and administrative costs.

Finally, at this point, it remains unclear whether DASH usage will be royalty-free. This may impact adaption by many potential users, including Mozilla, who has already commented that it’s “unlikely to implement” DASH unless and until it’s completely royalty-free. With Firefox currently sitting at around 22% of market share, this certainly dims DASH’s impact in the HTML5 market.

Introduction to MPEG DASH
Adaptive streaming involves producing several instances of a live or on-demand source file and making them available to various clients depending upon their delivery bandwidth and CPU processing power. By monitoring CPU utilization and/or buffer status, adaptive streaming technologies can change streams when necessary to ensure continuous playback or to improve the experience.

One key difference between adaptive streaming technologies is the streaming protocol utilized. For example, Adobe’s RTMP-based Dynamic Streaming uses Adobe’s proprietary Real Time Messaging Protocol (RTMP), which requires a streaming server and a near-continuous connection between the server and player. Requiring a streaming server can increase implementation cost, while RTMP-based packets can be blocked by firewalls[.

A near-continuous connection means that RTMP can’t take advantage of caching on plain-vanilla servers like those used for Hypertext Transfer Protocol (HTTP) delivery, the delivery protocol used by Apple’s HTTP Live Streaming (HLS), Microsoft’s Smooth Streaming, and Adobe’s HTTP-based Dynamic Streaming (HDS). All three of these delivery solutions use standard HTTP web servers to deliver streaming content, obviating the need for a streaming server.

In addition, HTTP packets are firewall friendly and can utilize HTTP caching mechanisms on the web. This latter capability should both decrease total bandwidth costs associated with delivering the video, since more data can be served from web-based caches rather than the origin server, and improve quality of service, since cached data is generally closer to the viewer and more easily retrievable.

While most of the video streamed over the web today is still delivered via RTMP, an increasing number of companies will convert to HTTP delivery over time.

All HTTP-based adaptive streaming technologies use a combination of encoded media files and manifest files that identify alternative streams and their respective URLs. The respective players monitor buffer status (HLS) and CPU utilization (Smooth Streaming and HTTP Dynamic Streaming) and change streams as necessary, locating the alternate stream from the URLs specified in the manifest file.

HLS uses MPEG-2 Transport Stream (M2TS) segments, stored as thousands of tiny M2TS files, while Smooth Streaming and HDS use time-code to find the necessary fragment of the appropriate MP4 elementary streams.

DASH is an attempt to combine the best features of all HTTP-based adaptive streaming technologies into a standard that can be utilized from mobile to OTT devices.

MPEG DASH Technology Overview
As mentioned, all HTTP-based adaptive streaming technologies have two components: the encoded A/V streams themselves and manifest files that identify the streams for the player and contain their URL addresses. For DASH, the actual A/V streams are called the Media Presentation, while the manifest file is called the Media Presentation Description.

As you can see in Figure 1, the Media Presentation is a collection of structured audio/video content that incorporates periods, adaptation sets, representations, and segments.

Figure 1. The Media Presentation Data Model

The Media Presentation defines the video sequence with one or more consecutive periods that break up the video from start to finish. Each period contains multiple adaptation sets that contain the content that comprises the audio/video experience. This content can be muxed, in which case there might be one adaptation set, or represented in elementary streams, as shown in Figure 1, enabling features like multiple language support for audio.

Each adaptation set contains multiple representations, each a single stream in the adaptive streaming experience. In the figure, Representation 1 is 640x480@500Kbps, while Representation 2 is 640x480@250Kbps.

Each representation is divided into media segments, essentially the chunks of data that all HTTP-based adaptive streaming technologies use. Data chunks can be presented in discrete files, as in HLS, or as byte ranges in a single media file. Presentation in a single file helps improve file administration and caching efficiency as compared to chunked technologies that can create hundreds of thousands of files for a single audio/video event.

The DASH manifest file, called the Media Presentation Description, is an XML file that identifies the various content components and the location of all alternative streams. This enables the DASH player to identify and start playback of the initial segments, switch between representations as necessary to adapt to changing CPU and buffer status, and change adaptation sets to respond to user input, like enabling/disabling subtitles or changing languages.

Other attributes of DASH include:
  • DASH is codec-independent, and will work with H.264, WebM and other codecs.
  • DASH supports both the ISO Base Media File Format (essentially the MP4 format) and MPEG-2 Transport Streams.
  • DASH does not specify a DRM method but supports all DRM techniques specified in ISO/IEC 23001-7: Common Encryption.
  • DASH supports trick modes for seeking, fast forwards and rewind.
  • DASH supports advertising insertion.

In terms of availability, DASH should be completed and ready for deployment in the first half of 2012.

MPEG DASH Intellectual Property Issues
At this point, it’s unclear whether DASH will be encumbered by royalties, and where they might be applied. For example, the MPEG-2 video codec comes with royalty obligations for encoders, decoders, and users of the codec. Many of the participants who are contributing intellectual property to the effort—including Microsoft, Cisco, and Qualcomm—have indicated that they want a royalty-free solution. While these three companies comprise the significant bulk of the IP contributed to the specification, not all contributors agree, so the royalty issue is unclear at this time.

Other issues include whether browser-vendor Mozilla can integrate DASH into their Firefox browser if the underlying media that a DASH MPD reference uses royalty-bearing components to play back. This is one of the key reasons that the company didn’t integrate H.264 playback into the Firefox browser in the past, along with the potential $5 million dollar per year royalty obligation.

We asked Mozilla about their intentions regarding DASH, and they sent this statement from Chris Blizzard, Director of Web Platform:

“Mozilla has always been committed to implementing widely adopted royalty-free standards. If the underlying MPEG standards were royalty free we would implement DASH. However, MPEG DASH is currently built on top of MPEG Transport Streams, which are not royalty free. Therefore, we are unlikely to implement at this time.”

According to website NetMarketShare, as of November 18, 2011, Firefox enjoyed a 22.5% market share in the desktop browser market. Without support from Firefox, DASH obviously doesn’t represent a standard that will unify the approach to adaptive streaming in the HTML5 market.

In addition, as a codec-agnostic technology, DASH also does nothing to resolve the HTML5 codec issue, so even if it was fully adopted by all HTML5-compatible browsers, content producers would still have to encode in both H.264 and WebM for universal playback.

Obviously, this doesn’t preclude DASH from being integrated into plug-ins like Flash or Silverlight or being implemented in mobile or OTT devices, and playing a significant role in these markets. However, as things exist today, it’s hard to see DASH as the cure-all solution for the current lack of live, adaptive streaming, and DRM support in desktop HTML5 browsers.

And, in the absence of affirmative statements from Apple or Adobe that they will adopt the standard once finalized, it’s unclear how much immediate traction DASH will gain in the mobile and plug-in markets. Let’s see why.

MPEG DASH Competitive Issues
To a great degree, DASH levels the playing field among competitive players in the adaptive streaming space. For example, Apple’s HLS provides a distinct competitive advantage over other mobile platforms as it’s a widely adapted specification that allows all connected iDevices to play adaptive streams. That’s why Google decided to implement HLS in Android 3.0. Distributing video to Apple iOS devices has been relatively straightforward because of HLS, while the lack of a technology standard and the diversity of devices has made distributing video to Android, Blackberry, and other mobile markets very challenging.

If Apple adopts DASH and implements it on all existing connected iDevices, this competitive advantage disappears, and all DASH-enabled mobile devices are on a level playing field respecting video playback. To be clear, Apple representatives have been active in creating the specification and there is no indication that they won’t support it when it’s released. However, none of the Apple representatives that we contacted were able to comment on Apple’s intent, which is not unusual given that Apple seldom comments on unreleased products. Still, Apple is not known for its competitive graciousness, and adapting DASH would clearly make their products less competitive vis a vis other mobile platforms, at least in the short term.

On the flip side, content publishers want a distribution mechanism with flexible and complete DRM, which iDevices don’t currently provide. If enough content producers support DASH-enabled platforms, but not iDevices, that will obviously motivate Apple to support the spec. However, unless and until Apple supports DASH, it’s unlikely that producers without DRM concerns will stop producing HLS streams, which may lesson the attractiveness of supporting DASH.

To a lesser degree, the same principle holds true for Adobe since the Flash Player’s ubiquity on the desktop is a key competitive advantage over Microsoft’s Silverlight and even HTML5. Though Adobe participated in the standards work, they haven’t committed to supporting DASH in future versions of the Flash Player. Again, Adobe seldom comments on future products, so you can’t draw any conclusion from their silence.

DASH is an extraordinarily attractive technology for web producers, a single standard that should allow them to encode once, and then securely distribute to a universe of players, from mobile to OTT, and to the desktop via plug-ins or HTML5. In addition to not resolving the HTML5 codec issue, it’s also unclear whether publishers will be charged for the privilege of producing files using the DASH spec, which could be a significant negative.

Mozilla has already indicated that they probably won’t support the specification as currently written, and Apple and Adobe have not affirmed if or when they will support the technology. An optimist would assume that the value of DASH to the streaming media marketplace would compel all stakeholders to make their contributes royalty free, and convince Apple, Adobe and Mozilla to support the specification soon after its release. Until all this plays out, though, DASH may play a significant role in some markets, but won’t reach its full potential.

By Jan Ozer, Streaming Media

DPP to Release Metadata App

The Digital Production Partnership (DPP) will release a metadata application next year to help production companies comply with proposed file-based delivery requirements. The web-based app will allow production companies to enter a required set of metadata that is associated with a completed TV programme and wrap it into MXF files that are compliant with the cross-broadcaster group’s common standards.

“This will enable the creation of finished file-based TV programmes by independent producers for onward delivery to major UK broadcasters,” the DPP said.

The app, which was described as being compatible with common PC and laptop operating systems, is set to be rolled out during “spring/summer” next year.

From 2012 the BBC, ITV and Channel 4 will begin to take delivery of programmes as files on a selective basis. File-based delivery will be the broadcasters preferred delivery format by 2014. The common standard for the delivery of file-based programmes is set to be announced in by January.

Source: Broadcast

Video Search Coming to X-box Live

On the march to Microsoft’s transition of X-box from simple game console device to living room entertainment hub, the company announced the acquisition of yet another piece of the total solution to entertainment nirvana-video search technology. VideoSurf, whose technology "sees" frames inside videos to make discovering content fast, easy and accurate, will now be integrated into the X-box.

Video content analytics is a keystone addition to the mix of services Microsoft plans to provide. It should do for video and the literal ocean of voice and image data locked in its frames, what text-based search did for the written word online. With this technology, it’s now possible to do both visual and audio searches from web sites, as well as premium content providers like Bravo and HBO, to find specific videos in real-time. There’s even a mobile app available from VideoSurf as well.

You can see a cool VideoSurf video that covers the four critical components of the technology; identify, surf, watch, and share. For example, there is a social component to the technology that offers personal profile, my search, and also allows user to "like" a show and actually post a specific video scene along with the comments. That’s sure to get interaction flowing in the social space.

Click to watch the video

The video ID technology is at the cutting edge of next generation search, delving into the video in meaningful ways for viewers. Voice and image recognition software can be applied to video frames to identify specific people and objects.

The implications are not only a richer viewing experience, but commerce will use the technology to recognize and tag images, offering new and compelling ways to connect advertisers with customers interactively with a growing base of content.

One look at the X-box content page tells the story with key players lined up in U.S. and worldwide that include ESPN, HuluPlus and Netflix. Microsoft claims up to 40 providers including Bravo, Cinemanow, HBOGo and many others will be available to the MS Live network "at some future point in time." Regionally, Microsoft announced content partnerships with BBC in the U.K.; Telefónica in Spain; Rogers On Demand in Canada; Televisa in Mexico; ZDF in Germany; and Mediaset in Italy.

The picture is becoming more lucid, as Microsoft has continued to introduce new and compelling technology to improve its X-box offering. Kinect gesture driven input is one good example, along with Bing voice enabled search, which extends the human input range beyond mouse and keyboard to the growing base of content the X-box Live system is building for its 35M subscribers.

When technologies are added up, Microsoft has a compelling offering to bring to the living room. This includes gesture (Kinect) and voice activated input, video phone features (Skype) combined with the power of text based search, (Bing) and now video search (VideoSurf) all conceivable accessible from both the Xbox game console and potentially, Windows Mobile O/S in the future. And perhaps the best lesson for Apple and Google TV offerings, never count out Microsoft; they keep swinging until they get it right.

By Steve Sechrist, Display Daily


Broadcasters no longer have the monopoly on the delivery of A/V entertainment to the home. In the fiercely competitive world of media and entertainment, companies have to deliver more versions and formats, but without increasing their costs. A channel is now expected to have a Web presence, as well as mobile and tablet versions of their content.

Organizations like the EBU and Advanced Media Workflow Association (AMWA) are promoting the Service-Oriented Architecture (SOA) as a route to provide the interoperable media services that can serve the new business requirements.

For the seasoned video engineer, the world of SOA introduces terms and concepts that at first encounter seem foreign and more suited to the IT specialist. As video processing migrates to the file domain, there is no option but to become familiar with what at first sight may appear alien.

Broadcast systems have evolved around the imposed workflow of the serial processing steps of videotape operations. Over time, many processes have moved from dedicated hardware boxes with SDI in and out to software applications on a network. A typical broadcast operation is now a hybrid of SDI and IP connections.

In many cases, the workflow remains as the original tape-based flow. Over time, other applications like asset and workflow management are layered over the entire process chain. The system has grown by accident, not by design, and become a web of custom or proprietary interfaces linking the many applications.

Sure it works, it was designed that way, but when the time comes to replace a component part — say the playout automation — the inflexibility of the system rapidly comes apparent. The parts of the system are linked by a web of custom APIs, often restricted to a specific release of a specific software application. It is just not possible to swap out the automation for the latest product without attending to the web of interfaces.

To meet the demands for new services to the public, the broadcaster must add facilities for a mobile news service, a 3-D channel and interaction with a social media website. Along comes CES and some new consumer device to consume content. How do you add support for this new device? Will it mean more custom interfaces or more special workflow applications? The EBU and AMWA are developing a Framework for Interoperable Media Services (FIMS), which aims to provide a new technology platform that leverages current IT practices, like the use of the SOA, to provide business agility and to control costs.

The legacy system architectures suffer from many problems, and these scenarios serve to illustrate just two.

Scenario One: Ingest
Take the example of ingest. A VTR connects to a video noise reducer via HD-SDI. This then connects again via HD-SDI to an encoder card. The card encodes to a suitable codec for editing and also creates a low-resolution proxy.

At some point this is updated so that the encoder card ingests directly from the VTR and encodes to an uncompressed format. The noise reduction is performed in software. The uncompressed file is saved in a watch folder and picked up by transcoder software, which creates files in the wanted resolutions and codecs.

Imagine more control is needed over the clean up, and the file is passed to an operator who will apply craft skills to the clean-up and repair process. Any changes to the process flow involve wiring changes and software reconfiguration.

The system is just not flexible or agile enough for the constant changes needed to serve today's requirements.

Each evolution of the ingest system needs wiring and configuration changes

Scenario Two: System Upgrades
A broadcaster has a transcoder that is interfaced to a DAM. The broadcaster starts with Transcoder A and DAM A. The transcoder is replaced by a later model from a different vendor, Transcoder B. This requires a new interface to the DAM. The broadcaster migrates to a new DAM B, so the interface to Transcoder B must be rewritten.

The broadcaster liked the compression quality of Transcoder A and decides to move back, but must pay for another interface to the new DAM. Next, the vendor for Transcoder A has to redesign its product to move from a 32-bit to 64-bit operating system. In the process, the API is upgraded, and a new SDK is issued. The custom interface must be similarly upgraded.

Using two transcoder vendors and two DAMs has required the development of five custom interfaces over time. This adds up to costs for professional services, and for the manufacturer, the opportunity cost of not using software resource to develop new products. It is an unsustainable business model.

Each upgrade needs a new custom API

Integrating an SOA
The implementation of an SOA in a media business must take into account the special needs of the processes that manage A/V files. Off-the-shelf IT products do not cater for these additional requirements without extensive customization. The work to develop FIMS will create a set of standards for vendors to create compliant products and save unnecessary professional services.

This melding of IT with a media-savvy system means that the video engineer and IT specialist can collaborate and share their skills in the creation of products and systems that meet the new demands of the media and entertainment sector.

The linear nature of legacy production is implemented by serial processes acting on the content: ingest, editing, transcoding, distribution, playout and archive being typical. It may be that content is transcoded at ingest — AVC-Intra to DNxHD, for example — and then transcoded after editing to different delivery formats. This could be done by a transcoder in the edit bay and a transcoder in master control.

In a file-based environment, devices all sit on the media network. It makes sense to pool all the transcoders as a central resource and make them available as a transcode service. It could even be that cloud transcoding could be used. This offers much more efficient use of transcoder resources, but for it to work, the service should be reusable at any point in the workflow and should interoperate with all the necessary equipment in the organization, whatever make or model.

This calls for standardization so that a transcode service has a common interface, whatever make or model, and capabilities. A given transcoder may not support all codecs, but the interface can still be common. If demand increases, new transcoders can be added; the architecture scales easily.

The move to an SOA involves a change of mind-set to view processes within the organization as services. Since most post houses have a rate card of services they offer, it is not a great leap to understand this concept. The services include ingest editing, grading, dubbing, encoding, sound mixing, etc. Note that some services require extensive and lengthy use of creative staff; services are not only computer processing operations like transcoding.

A post facility can take a videotape and return a copy of the tape encoded on a Blu-ray disc. All the detail of the process is hidden. You don't need to know which tape deck was used or what software was used to burn the disc. It is all abstracted from the client. The facility performs a service and bills for that service; it is a business transaction.

In an SOA, this concept of abstracting, or encapsulating, the detail of the process is key. If the post house starts using different software, or a different VTR, you don't care as long as the disc is a faithful copy of the tape.

An SOA uses loose coupling for services, possibly a foreign concept for video engineers used to real-time systems. But many services are already loosely coupled. Captioning is one example. A low-resolution proxy is sent out to the captioning facility, and at some point in the future a caption file is returned. The process is managed at a business level by a scheduling department.

Service Adaptors
Many existing products need the user to control the service via a custom API using an SDK provided by the vendor. In an SOA, services are linked to the middleware via service adaptors. These provide the abstraction from the implementation of the service — how it works — to deliver the service at the business level.

The FIMS project is working to define standard interfaces for common media services. A system with FIMS components contains two broad service categories: workflow services able to realize a given business goal (media workflow services), and infrastructure services that are essential components of the media SOA system (media infrastructure services).

The first three workflow services to be developed are:
  • Transfer (moving files)
  • Transform (changing essence or metadata)
  • Capture (ingest stream to file)

It is envisaged further services will be added to the list as dictated by demand from vendors and users.

The first three SOA services for FIMS are: transfer, transform and capture

Building an SOA
In all but the smallest facility, some middleware is required to manage the services, scheduling resources and the workflow. To link services together to implement broadcast workflows, the SOA middleware orchestrates job requests, calling appropriate services to satisfy a request. There are many SOA middleware systems, but few are “media aware.” It is hoped that FIMS will make it easier for several manufacturers to offer compliant products.

One important term is “web services.” Web services are used for communications between the many components of the SOA implementation. SOA is just an architecture — a methodology; it is not a software application.

Web services have become a commodity, with companies like Amazon and Google offering storage, processing and many other services in the cloud. Web services are not exclusive to the Web, but can equally be used on the company LAN.

Web service standards have two interaction styles: SOAP/WSDL and RESTful. RESTful only operates over HTTP. SOAP is agnostic to transport protocol, so it can use other protocols like TCP. SOAP was designed for distributed computing, whereas RESTful is a lightweight protocol for point-to-point communications. Both can be used to bind a service provider with a service consumer. FIMS uses an object model described by XML schemas, which provides independence from the method used to bind services.

Media-Centric Issues
File-based media systems need to allow for certain issues not found in typical SOA implementations. One is the very large size of media files, terabytes in some cases. Media files cannot travel as attachments on a conventional enterprise service bus, the interconnection bus of an SOA. Instead a separate media service bus is used. This concept is common in broadcast facilities where a high-bandwidth media network carries files separately from the main network, which carries control messages, e-mails etc.

A typical SOA service, like converting a Word document to PDF, happens in seconds. Media processes can take hours or even days. Such operations must happen asynchronously from other processes. Operations such as transcoding must be scheduled to business rules to ensure resources are not used for a low-priority job, holding up a high-priority job.

A resource manager service must manage use of services according to business rules that manage the needs of broadcast operations.

The media bus is a potential bottleneck to scaling of services, so SLA monitoring is vital to ensure efficient running of the media systems. To the video engineer, used to non-blocking systems, with dedicated circuits and crosspoint routers, it is easy to keep track of bandwidth requirements. In a file-based environment, care has to be taken over the use of the media bus though continual monitoring of capacity and performance.

The AMWA and EBU have now joined with SMPTE to move forward standardization of the framework. After formalization by AMWA and EBU, the work will be submitted to SMPTE.

The standards work is not following a big-bang approach, but progressing in steps, with usable specifications released at regular intervals so that vendors and media companies can start deploying systems immediately, rather than waiting for a long, drawn-out standards process, which has been the case in the past. Businesses have to move now; there is no option to wait for an all-encompassing answer to everything in the media factory. By releasing the framework, and then key adaptors, further work will proceed as the demand arises from the community.

The move to an SOA, and to follow the FIMS route, promises many advantages to the broadcaster. The architecture is more flexible and scalable than legacy systems. The broadcaster can more easily outsource to external services or cloud provision. And finally, the system — through the dashboard — gives management a better view of key performance indicators of the business. The system agility and the better control gives management the ability to invest in new services and to improve the efficiency of existing operations.

By David Austerberry, Broadcast Engineering

Archive eXchange Format

The long-awaited AXF open format for long-term preservation and storage is designed to support interoperability among systems and ensure future access to valuable, file-based assets regardless of type or how they are stored.

Thanks to technology, we've now got more ways than ever to communicate with each other through audio and video. However, the same proliferation of technology that creates so much opportunity also has resulted in a multitude of formats and systems for storing digital media. However, those formats and systems often are not compatible with one another. Here we are not talking about interoperability of the media files themselves (as has long been the dream of MXF), but rather the actual operating system, file system, storage technology and devices used to capture, store and protect these valuable media assets now and in the future.

This diversity and potential long-term incompatibility makes reliable and guaranteed access to these assets complicated, expensive and sometimes downright impossible. Solving the problem means establishing a common format for digital media storage that works not only with any existing system, but also systems that have yet to evolve — an open standard for the long-term storage and preservation of media assets.

Although this may seem unnecessary on the surface, there are many documented cases today where important files stored on dated technology using non-standardized methods have become inaccessible and are therefore lost forever. We will likely be able to recreate an MPEG-2 software decoder on whatever platforms exist 100 years from now, but are we certain we'll be able to find a system compatible with FAT32 to be able to recover the MPEG-2 content itself?

The answer to this daunting problem lies in the new Archive eXchange Format (AXF), an open format that supports interoperability among disparate content storage systems and ensures the content's long-term availability no matter how storage or file system technology evolves. AXF inherently supports interoperability among existing, discrete storage systems irrespective of the operating and file systems used and also future-proofs digital storage by abstracting the underlying technology so that content remains available no matter how these technologies evolve.

What is AXF?
At its most basic level, AXF is an IT-centric file container that can encapsulate any number of files, of any type, in a fully self-contained and self-describing package. The encapsulated package actually contains its own file system, which abstracts the underlying operating system, storage technology and original file system from the AXF object and its valuable payload. It's like a file system within a file that can store any type of data on any type of storage media.

The Embedded File System
This innovative embedded file system approach is AXF's defining attribute. It allows AXF to be both content- and storage-agnostic. In other words, because the AXF object itself contains the file system, it can exist on any generation of data tape, spinning disk, flash, optical media or other storage technology that exists today or might exist tomorrow.

Because of this neutrality, AXF certainly supports the modern generation of data tape technologies (LTO5, TS1140 and T10000C, for example) and because there is no dependency on the features of the storage technology itself, it supports all legacy storage formats as well.

What Makes AXF Better?
AXF offers many significant advantages over other formats and approaches such as Tape ARchive (TAR) and Linear Tape File System (LTFS) for long-term storage, protection and preservation.

AXF can scale without limit, which distinguishes it sharply from legacy container formats like TAR. Like AXF, TAR uses a file container approach that works on any file type of any individual or total file size with support for multiple operating systems. However, TAR's age and tape-based roots yield inevitable limitations. For example, it incorporates neither descriptive metadata support nor a central index for file payload information, which makes random access to files challenging and slow. In large TAR archives, the performance penalty is significant, effectively making the format unsuitable for any situation where random access to individual files is required, let alone random access to portions of the contained files as required by operations such as timecode-based, partial restore.

Certainly, TAR has evolved over the decades, but it has done so typically in divergent paths that lead away from its open-source origins. As a result, it is difficult, or impossible, to recover some TAR packages today, rendering them lost forever.

Also in contrast to TAR, AXF incorporates resiliency features that make it possible to recover object contents, descriptive metadata and media catalogs in a multitude of failure and corruption situations. AXF also incorporates comprehensive fixity and error-checking capabilities in the form of multiple per-file and per-structure checksums. TAR lacks these features that should be considered mandatory for modern systems.

The embedded file system enables AXF to translate between any generic set of files and logical block positions on any storage medium, whether the medium has its own file system or not. This essentially abstracts the underlying file system and storage technology and allows systems that comprehend AXF to ignore any of their complexities and limitations.

While AXF can work in harmony with LTFS, it also has advantages over it. LTFS relies on storage technology elements — such as partitioning and file marks on data tape — that hinder both its storage capabilities and its performance. Likewise, a format such as LTFS is ineffective for complex file collections with tens of thousands or even millions of related elements as it lacks any form of encapsulation and instead relies on simplistic file and path arrangements.

AXF can support any number and type of files in a single, encapsulated package, which inherently means these AXF objects can grow exponentially in size. With its inherent support for spanning objects across media (such as over multiple data tapes), AXF offers significant advantages over LTFS, which offers no spanning support — rendering it ineffective in large-scale archives typical in media operations.

For the preservationist community, AXF offers support for the core OAIS (Open Archival Information System) reference model with built-in features such as fixity (per-file checksums and per-structure checksums), provenance, context, reference, open metadata encapsulation and access control. This adherence to established industry practices is another significant benefit of AXF over LTFS.

Once content is stored in the system, the media itself can be transported directly to any other system that also comprehends AXF offering the same “transport” capabilities of LTFS with the additional features highlighted above.

Front Porch Digital is currently working with SMPTE to standardize AXF and promote it as an industry-wide method for storage and long-term preservation of media assets. Further, the committee hopes its work will extend far outside of the media and entertainment space and into the broader IT community because of its wide-reaching applicability and unparalleled features.

These factors are key to AXF's ability to support large-scale archive and preservation systems as well as simple, standalone applications.

How Does AXF Work?
AXF is designed so that each AXF Object (or package) is comprised of three main components regardless of what technology is used to store them (spinning disk, flash media, data tape without a file system, data tape with a file system, etc.).

The structure of an AXF object includes a header, the payload and a footer

The first part is that each AXF Object originates with an Object Header — a structure containing descriptive XML metadata such as the AXF Object's unique identifier (UUID and UMID), creation date, object provenance and file-tree information, including file permissions, paths, etc. Following the AXF Object Header is any number of optional AXF Generic Metadata packages. The AXF Generic Metadata Packages are self-contained, open metadata containers in which applications can include AXF Object-specific metadata. This metadata can be structured or unstructured, open or vendor-specific, binary or XML.

The second component of the Object construct is the File Payload — the actual byte data of the files encapsulated in the object. The payload consists of any number of triplets — File Data + File Padding + File Footer. File padding, which ensures alignment of all AXF Object elements on storage medium block boundaries, is key to the AXF specification. The File Footer structure contains the exact size of the preceding file, along with an optional file-level checksum designed to be processed on-the-fly by the application during restore operations with little or no overhead.

The final portion of an AXF Object is the Object Footer, which repeats the information contained in the Object Header and adds information captured during the Object's creation, including per-file checksums and precise file and structure block positions. The Object Footer is important to the resiliency of the AXF specification because it allows efficient re-indexing by foreign systems when the media content is not previously known, offering media transport between systems that follow AXF specification.

Because of this standardized approach to the Object construct, which abstracts the underlying complexities of the storage media itself, simple access to the content is ensured regardless of the evolution of technology now and into the future.

Special Structures for Use with Linear Data Tape
When used with linear data tape typical in large-scale archives today, an AXF implementation includes three additional structures to incorporate key, self-describing characteristics on the medium itself, ensuring full recoverability and transportability:
  • ISO/ANSI standard VOL1 volume label: The first structure, which appears on the medium, is an ISO/ANSI standard VOL1 volume label. This label indentifies a tape volume and its owner. This is included for compatibility purposes with legacy applications to ensure they do not erroneously handle AXF formatted media and to signal applications that do understand AXF to proceed with accessing the objects contained on the medium.

  • Medium Identifier: The second structure is the Medium Identifier, which contains the AXF volume signature and other information about the storage medium itself. The implementation of the Medium Identifier differs slightly depending on whether the storage medium is linear or nonlinear, and whether it includes a file system or not, but the overall structures are fully compatible.

  • AXF Object Index: The third structure is the AXF Object Index, which is an optional structure that assists in the recoverability of AXF-formatted media. Information contained in this structure is sufficient to recover and rapidly reconstruct the entire catalog of AXF Objects on the storage medium. In a case where the application has not maintained the optional AXF Object Index structures, the contents of each AXF Object can still be reconstructed by simply processing each AXF Object Footer structure.

Who is the Ideal AXF User?
AXF was developed to meet a broad spectrum of user needs — from those accessing petabytes of data in a high-performance environment to those looking to simply encapsulate a few files and send them to a friend via email. AFX is completely scalable to accommodate an operation of any size or complexity. In all cases, AXF offers an abstraction layer that hides the complexities of the storage technology from the higher-level applications, while it also offers fundamental encapsulation, provenance, fixity, portability and preservation characteristics. In addition, the same self-describing AXF format can be used interchangeably on all current storage technologies, such as spinning disk, flash media and data tape from any manufacturer now and into the future.

The Bottom Line
AXF has the ability to support interoperability among systems, help ensure long-term accessibility to valued assets and keep up with evolving storage technologies. It offers profound present and future benefits for any enterprise that uses media — from heritage institutions, to schools, to broadcasters, to simple IT-based operations — and is well on its way to becoming the long-awaited, worldwide, open standard for file-based archiving, preservation and exchange.

More information on AXF is available at OpenAXF.

By Brian Campanotti, Broadcast Engineering

Video Compression Technology: HTML5

A media container is a “wrapper” that contains video, audio and data elements, and can function as a file entity or as an encapsulation method for a live stream. Because container formats are now starting to appear in broadcast situations, both OTA and online, it is useful to consider the various ways that compressed video (and audio) are carried therein, both by RF transmission and by the Internet.

Web Browsing and Broadcasting Crossing Over
The ubiquitous Web browser is a tool that users have come to rely on for accessing the Internet. Broadcasters already make use of this for their online presence, by authoring content and repurposing content specifically for Internet consumption. But browsing capability is something that will come to OTA broadcast as well, once features like non-real-time (NRT) content distribution become implemented. For example, by using the ATSC NRT specification, now under development, television receivers can be built that support different compression formats for cached video, including AVC video and MP3 audio, and different container file formats, such as the MP4 Multimedia Container Format. It is envisioned that these receivers will have the capability of acting as integrated live-and-cached content managers, and this will invariably involve support for different containers and codecs. For this reason, we need to understand how browsers and containers — two seemingly different technologies — are related in the way they handle content.

Several container formats currently provide encapsulation for video and audio, including MPEG Transport Stream, Microsoft Advanced Systems Format (ASF), Audio Video Interleave (AVI) and Apple QuickTime. While not a container format per se, the new HTML5 language for browsers nonetheless has the capability of “encapsulating” video and audio for presentation to a user. With the older HTML, there was no convention for playing video and audio on a webpage; most video and audio have been played by the use of plug-ins, which integrate the video with the browser. However, not all browsers support the same plug-ins. HTML5 changes that by specifying a standard way to include video and audio, with video and audio “elements.”

HTML5 is a new specification under development to replace the existing HTML used by Web browsers to present content since 1999. Among the key requirements of HTML5 are that it be device-independent and that it should reduce the need for external plug-ins. Some of the new features in HTML5 include functions for embedding and controlling video and audio, graphics and interactive documents. For example, a “canvas” element using JavaScript allows for dynamic, scriptable rendering of precise 2-D shapes (paths, boxes, circles, etc.) and bitmap images. Other content-specific elements provide more control over text and graphics formatting and placement, much like a word processor, and new form controls support the use of calendars, clocks, e-mail and searching. Most modern browsers already support some of these features.

The HTML5 Working Group includes AOL, Apple, Google, IBM, Microsoft, Mozilla, Nokia, Opera and many other vendors. This working group has garnered support for including multiple video codecs (and container formats) within the specification, such as OGG Theora, Google's VP8 and H.264. However, there is currently no default video codec defined for HTML5. Ideally, the working group thinks that a default video format should have good compression, good image quality and a low processor load when decoding; they would like it to be royalty-free as well.

Multiple Codecs Present Complex Choices
HTML5 thus presents a potential solution to manufacturers and content providers that want to avoid licensed codecs such as Adobe Flash (FLV), while preferring the partially license-free H.264 (i.e., for Internet Broadcast AVC Video), and fully license-free VP8, Theora and other codecs. Flash, which has become popular on the Internet, most often contains video encoded using H.264, Sorenson Spark or On2's VP6 compression. The licensing agent MPEG-LA does not charge royalties for H.264 video delivered to the Internet without charge, but companies that develop products and services that encode and decode H.264 video do pay royalties. Adobe nonetheless provides a free license to the Flash Player decoder.

HTML5 can be thought of as HTML plus Cascading Style Sheets (CSS) plus JavaScript. CSS is a language for describing the presentation of webpages, including colors, layout and fonts. This allows authors to adapt the presentation to different types of devices, such as large screens vs. small screens. Thus, content authored with HTML5 can serve as a “raw template,” and repurposing to different devices entails generating appropriate CSS for each device. (This is known to programmers as separating “structure” from “presentation.”) JavaScript is an implementation of ECMAScript, both of which are scripting languages that allow algorithms to be run on-the-fly in decoders. Because JavaScript code runs locally in a user's browser, the browser can respond to user input quickly, making interaction with an application highly responsive.

Websites often use some form of detection to determine if the user's browser is capable of rendering and using all of the features of the HTML language. Because there is no specific “flag” that indicates browser support of HTML5, JavaScript can be used to check the browser for its functionality and support of specific HTML features. When such a script runs, it can create a global object that is stored locally and can be referenced to determine the supported local features. This way, the content being downloaded can “adapt” itself to the capabilities of different browsers (and decoder hardware). Scripts are not always needed for detection, however. For example, HTML code can be written, without the use of JavaScript, that embeds video into a website using the HTML5 “video” element, falling back to Flash automatically.

HTML5 also provides better support for local offline storage, with two new objects for storing user-associated data on the client (the playback hardware/software): localStorage, which stores data with no time limit, and sessionStorage, which stores data for one session. In the past, personalization data was stored using cookies. However, cookies are not suitable for handling large amounts of data because they are sent to the server every time there is an information request (such as a browser refresh or link access), which makes the operation slow and inefficient. With HTML5, the stored object data is transferred only when a server or client application needs it. Thus, it is possible to store large amounts of data locally without affecting browsing performance. In order to control the exchange of data, especially between different websites, a website can only access data stored by itself. HTML5 uses JavaScript to store and access the data.

Years ago, content developers predicted the crossover of television and Internet. With standard codecs, container formats and specifications like HTML5, integration of the two media will soon be common.

By Aldo Cugnini, Broadcast Engineering

OTT Video Delivery

OTT video, or streaming media, is an evolving set of technologies that deliver multimedia content over the Internet and private networks. A number of online media platforms are dedicated to streaming media delivery, including YouTube, Brightcove, Vimeo, Metacafe, BBC and Hulu.

Streaming video delivery is growing dramatically. According to the comScore 2009 U.S. Digital Year in Review Video Metrix, Americans viewed a significantly higher number of videos in 2009 than in 2008 (up by 19 percent) because of both increased content consumption and the growing number of video ads delivered.

In January 2010, more than 170 million viewers watched videos online. The average online viewer consumed 187 videos in December 2009, up 95 percent over the previous year, and the average video duration grew from 3.2 to 4.1 minutes.

Hulu, for example, in that same month delivered more than 1 billion streams for a total of 97 million hours. According to comScore, the character of video viewing is changing as well, with more people watching longer content.

There is a growing effort by broadcasters to make regular TV content available online. For example, the BBC has developed the BBC iPlayer and the website to support replication of most BBC broadcast material. The service has been outstandingly successful: 79.3 million requests were serviced in October 2009. NBC coverage of the 2010 Winter Olympics included live and recently recorded content, complete with commercials.

Whenever there is the possibility of a large or dynamic viewer audience, a reliable CDN is required. CDNs once only used to replicate website content around the world. Now, they have expanded dramatically to handle streaming media. Research and markets estimated the value of CDN services for 2008 at $1.25 billion, up 32 percent from 2007.

Top CDNs include Akamai, Mirror Image Internet, Limelight Networks, CDNetworks and Level 3. Streaming media services must deal with content collected from disparate sources and distributed to a growing number of devices.

Technology Trends
The most common network protocol used to transport video over IP networks is the Real Time Streaming Protocol (RTSP). RTSP is a stateful protocol used to establish and control media sessions between a media server and client viewer. RTSP clients issue VCR-like commands to control media playback. The transmission of the audio/video stream itself is most often handled by the Real-time Transport Protocol (RTP), although some vendors have implemented their own transport protocol. RTSP and RTP are almost universally used to implement VOD features.

Most video players, such as the Adobe Flash Player, use proprietary protocols that provide additional functionality and flexibility. Flash Player has an almost total presence on PCs and Macs, and is used to deliver more than 80 percent of online videos. The Adobe Flash Player is a lightweight client embedded in Web browsers. Adobe uses the Real Time Messaging Protocol (RTMP) to deliver streaming content, providing multiple independent channels, which are used to control and deliver content. RTMPT is an RTMP variant that encapsulates RTMP packets in HTTP.

First released in 2007, Microsoft's Silverlight player is growing in popularity within the player market. The Silverlight player uses HTTP as its top-level transport mechanism and for media streaming. Using HTTP as a single transport mechanism can result in significant internal cost reduction for end-to-end delivery. Silverlight includes Digital Rights Management (DRM) features similar to those available in Adobe Flash.

HTTP Live Streaming (HLS) is a media streaming specification that is developed by Apple that uses HTTP as the transport. Devices such as the iPhone, iPad and Apple-compatible platforms support this streaming technology. The “Live” is misleading in the name, as this technology works for on-demand and live streaming. HLS supports streaming media that is segmented into smaller chunks of data, to improve delivery and user experience. An Extended M3U Playlist format file is used that contains the media segments to download.

Modern streaming media technologies adapt to changing network conditions, especially those related to mobile devices. As conditions degrade or improve, the player requests an alternate lower or higher bit rate media stream. Multiple flows are prebuilt or constructed at multiple bit rates and divided into chunks so that a player can seamlessly switch different flows.

The ability for a video player to adapt to varying network conditions is termed differently across players. In the Silverlight player, it is called Smooth Streaming; Adobe Flash 10.1 terms it HTTP Dynamic Streaming; and Apple iPhone's HLS refers to it as adaptive streaming.

How is IPTV Delivered?
Delivery of video to the consumer has undergone rapid change in recent years and is guaranteed to continue to do so in the future. Cable TV networks deliver a large range of content, and the ability to provide user interactive features, including VOD.

Carrying the most promise for the future is delivery of video over multiservice IP networks. This is commonly referred to as IPTV. It is delivered as a triple-play service to consumers that include High-Speed Internet (HSI) and VoIP.

Video over IP Information Flow
The major components and data flow in IPTV networks consists of media and control flowing between content servers and home networks.

The two types of video services delivered are linear broadcast and VOD. Both have dramatically different characteristics that affect the networks that handle them. Broadcasts are regularly scheduled programs sent to large numbers of subscribers. It is sent efficiently over multicast IP routes.

VOD service delivery exhibits an entirely different behavior from linear broadcast service. Stored videos are sent to the subscriber on demand. Each subscriber receives his or her own video flow, which they can control with VCR-like controls.

The differences are responsible for the complexity of the delivery network. Broadcast TV over IP is primarily a one-way channel, using well-understood multicast protocols. The home network is responsible for multicast messages and image display. VOD adds another level of complexity. Requests for and control of video content are transmitted upstream from the subscriber to the service provider using RTSP. Video content is returned to the subscriber through RTP.

IPTV Delivery Challenges
Until differentiating services are developed for IP-based video-voice-data networks, IPTV services will continue to be compared with traditional TV, cable and satellite service. As such, the IP delivery network must remain transparent to customers.

Customers expect video quality and service availability to be on par or better to make the switch. With multiple choices available to consumers, there is little tolerance for poor quality and operational problems. A poorly engineered network can lead to substantial customer churn.

To successfully deploy IPTV, the following end-user requirements must be addressed:
  • Video quality: subscribers' perception of quality must be the same or better than other alternatives;
  • Minimal channel change delay: because instant response is expected;
  • Assured service delivery and availability for an always-on service.

IPTV Testing Requirements
Service providers must systematically test and verify network devices in each of the video transport architectures, including video content servers, core and edge routers, access devices, and customer premises equipment. Such testing provides an understanding of individual device performance and may determine how much impact each has on the overall system.

System-level tests that incorporate more than one demarcation point in the transport architecture are required. In this way, a clear understanding of how well the individual systems play with each other is determined.

Finally, the network must be tested end-to-end. Most standard routing and forwarding performance tests should be performed, looking at packet loss or latency under different load conditions.

Test Methodologies
All types of video testing, both OTT and IPTV, require testing through large-scale subscriber emulation. That is, large user communities must be simulated performing “normal” activities in order to exercise video components, subsystems and end-to-end delivery. “Normal” activity is directly related to the type of video delivery.

Broadcast IPTV delivery is highly dependent on multicast operation. In order to avoid sending individual programs to individual users, all viewed channels are broadcast to all users wishing to view particular channels.

It is up to the STB and all routers between the source(s) and the subscriber(s) to join and leave multicast groups that correspond to a particular broadcast stream.

Broadcast IPTV testing requires emulation of subscribers engaged in two types of behavior:
  • Requesting a channel (joining a multicast group), watching for a period of time and then requesting an alternate channel (leaving one group and joining another);
  • Rapidly changing channels, often called “channel zapping”.

During this type of testing, the critical measurements are:
  • Video quality, largely related to jitter and drop outs. Several types of quality of experience metrics, including VMOS and VQMon, produce values that are closely related to how viewers “feel” about their experience;
  • Channel change latency — that is, the time between channel change request and response.

VOD stream delivery is point-to-point, as opposed to multicast. VOD users have VCR-like buttons at their disposal: play, pause, fast forward, rewind and stop. VOD testing requires emulation of subscribers engaged mostly in viewing and occasionally in VCR-like control activities.

During this type of testing, the critical measurements are:
  • Video quality, as described above;
  • Command latency — that is, response to VCR-like control activities.

OTT delivery is also point-to-point, and testing requires emulation of large audiences of users accessing a larger set of possible sources than with VOD. The same VCR-like controls are available, but due to the generally short nature of OTT content, are generally used less often.

What differentiates OTT from VOD and makes it much more difficult to test is changing connection rates. OTT content is saved many times over at the source for delivery at many different connection rates — for example, high resolution for broadband connections and low resolution for mobile devices. OTT delivery must quickly and transparently switch between streams based on conditions dictated by the consumer.

OTT testing, therefore, must emulate frequent bandwidth changes from a large community of users accessing many possible streams. During this type of testing, the critical measurements are:
  • Video quality, as described above. Empirically, one's expectations of quality for this OTT delivery are much less than broadcast TV;
  • Smooth presentation. As bandwidth availability changes, consumers must not be aware of the changeover of streams. That is, there should be no noticeable pauses.

BY Dave Schneider, Broadcast Engineering