Microsoft Announces Support for MPEG-DASH in Microsoft Media Platform

Microsoft Media Platform will support MPEG-DASH, a recently ratified ISO/IEC standard for dynamic adaptive streaming over HTTP. Microsoft plans to support DASH and other open standards as part of an industry-wide initiative to establish reliable video delivery to Internet connected devices and enable true interoperability between adaptive streaming technologies from different vendors.

Much like Smooth Streaming, DASH uses Extensible Markup Language (XML) to describe media presentations in a manifest file which references media streams stored in ISO Base Media File Format. Combined with the standard HTTP protocol and existing Web content delivery networks, the DASH standard enables a better video experience for end users by automatically adapting to varying client and network conditions during playback.

Taking advantage of similarities between Smooth Streaming and DASH, Windows Azure Media Services will add support for DASH Live Profile later this year so that both Smooth Streaming and DASH devices can access the same live and on-demand video presentations using either manifest format. This will enable a smooth transition to DASH for millions of devices and services currently using Smooth Streaming.

In addition to server-side support, Microsoft will also add support for DASH to all its Smooth Streaming client development kits. The first step will be to enable DASH support in the Smooth Streaming Client for Silverlight, followed by support in Smooth Streaming Client SDKs for Windows 8, iOS, Xbox, Windows Phone and Smooth Streaming Client Porting Kit for embedded devices.

Microsoft is also contributing to W3C efforts to standardize adaptive streaming APIs in HTML5 so that DASH Web applications may also be written in HTML5 and ECMAScript (JavaScript) in the future without requiring browser plug-ins such as Silverlight and Flash to enable advanced streaming media scenarios.

Microsoft has contributed to the development of the DECE UltraViolet video format which enables download and adaptive streaming of premium movie and TV content; and to various international broadcast standards and consortia so that a common protected video format based on DECE Common File Format, MPEG Common Encryption, and MPEG-DASH specifications will be supported by all adaptive streaming services and devices to enable reliable interoperability for consumers, just like broadcast TV and DVD.

What Microsoft services will have DASH support?
Windows Azure Media Services will provide encoding, encryption, and streaming support for Application Profiles based on DASH “ISO Base Media Live Profile” this year. Both DASH manifests and Smooth Streaming manifests will be generated to allow the same media to be streamed by DASH clients and Smooth Streaming clients. The primary media format will conform to the PIFF 1.3 specification in addition to Live Profile, will include several features and constraints compatible with the DECE Common File Format, and may optionally include MPEG Common Encryption with PlayReady DRM support. Windows Azure Media Services will also be capable of live transformation to multiple streaming formats, including MPEG-2 Transport Streams for use with DASH M2TS Simple Profile manifests or M3U8 playlists.

What Microsoft client technologies will have DASH support?
Microsoft plans to add MPEG-DASH support to all client development kits that currently support Smooth Streaming. These are: Smooth Streaming Client for Silverlight; Smooth Streaming Client for Windows Phone; Smooth Streaming Client SDK for Windows 8 Metro-style applications; Xbox LIVE Application Development Kit; Smooth Streaming SDK for iOS Devices with PlayReady; and Smooth Streaming Client Porting Kit.

Is Microsoft discontinuing Smooth Streaming?
No. Microsoft will continue to invest in Smooth Streaming as an established technology and brand while ensuring its Smooth Streaming services, clients, tools and workflows are DASH compatible. The Smooth Streaming file format (PIFF 1.3) is already compatible with the DASH specification (ISO Base Media Live Profile) so customers and partners who are investing into creation of Smooth Streaming content today will have a clear path to making that content deliverable to DASH clients in the future.

What is Common Encryption?
Common Encryption is an MPEG standard using AES-128 media encryption that enables a single protected ISO Base Media file or adaptive streaming presentation to be used with any DRM system supported by a device and the publisher. The standard is designated ISO/IEC 23001-7 “Information technology – MPEG systems technologies – Part 7: Common encryption in ISO base media file format files”. Prior to this standard, a different set of files was required for each different DRM type, and interchange of files between authorized devices was generally not possible because of different DRMs.

What is Common File Format?
Common File Format (CFF) is a DECE video specification titled “Common File Format & Media Formats Specification” used for content download. It specifies video files based on fragmented ISO Base Media files (MPEG-4 Part 12), optionally using Common Encryption, containing AVC video, AAC audio, SMPTE Timed Text and Graphics subtitles, metadata, and several optional audio formats. All parameters required for interoperability are sufficiently specified to allow independently implemented encoders, publishers, delivery services, and devices to reliably interchange and play the same files. Different “media profiles” are specified for high definition, standard definition, and “portable” definition devices.

The CFF requirement to use short movie fragments makes these files and compatible decoders forward compatible with DASH adaptive streaming using movie fragments as DASH Media Segments. DECE is currently in the process of specifying “Common Streaming Format” and considering DASH Application Profiles.

What about HTML5 playback?
The current working draft of HTML5 does not include specific support for either adaptive streaming or DRM protection. It is possible to indicate a playlist or manifest file as the source of the <video> tag, but a publisher would have no control over the behavior and presentation that each device or browser would execute in response to that manifest. There are no standard APIs to integrate the presentation decisions made by the platform with a presentation application running in the browser.

However, there is work underway in W3C to add both adaptive streaming and content protection APIs so that a script application will be able to run in any HTML5 browser to perform DRM license acquisition and DASH adaptive streaming under the control of the script application. This will allow the script application to control adaptive heuristics, authorization, load balancing, performance reporting, targeted ad insertion, and interactive presentation and navigation of one or more adaptive presentations. These script APIs will make HTML5/JavaScript DASH applications a viable alternative to Silverlight and Flash across the full range of devices … in the future.

AMWA Releases MXF Commercial Delivery Specification

The Advanced Media Workflow Association (AMWA) has released a new MXF Commercial Delivery specification, AS-12. The constrained version of MXF has been developed to enable more efficient handling of commercials through the many transactional and media processing operations between conception and air.

As broadcasters look to serve commercials to long tail delivery platforms as well as their primary channels, controlling costs is all-important. Many versions may exist of the same commercial, adding confusion to the traffic operations. Versions may be sourced from different distribution routes, arriving with different wrappers and codecs, as well as different aspect ratios.

MXF Commercial Delivery aims to solve two problems: unique identification, and defining a master spot for the creation of long-tail versions.

MXF Commercial Delivery unambiguously identifies the spot through the Ad-ID unique identifier carried in a “digital slate”. Current practice to identify commercials is by the visual slate preceding the commercial. Although a human operator can read this by playing the commercial, it does not lend itself to use by automated systems.

With MXF Commercial Delivery the advertisement identification metadata is carried as a Descriptive Metadata track and serves as a digital slate. The digital slate can be used to reconcile the video and audio components of the commercial with the traffic instruction thus preventing expensive mistakes. The Ad-ID unique identifier ensures that what the advertiser ordered gets to air.

MXF Commercial Delivery, AS-12, is an addition to AS-03, MXF for Delivery. AS-03 defines MXF files optimized for program delivery, and intended for direct playout via a video server. Used together, the specifications allow the agency to supply broadcasters with a master commercial, along with information like closed captions and AFD that the broadcaster can use to create the lower resolution versions appropriate to their long-tail delivery platforms.

The AS-12 metadata or digital slate can be created early in the production process to uniquely identify the commercial. AS-12 carries fields to identify the advertiser, the agency, brand and product, as well as the title. Through the use of the guaranteed unique Ad-ID, the rekeying of identifiers that typically happens today is avoided. House codes used by agencies or broadcasters are replaced with the Ad-ID, avoiding many of the issue of misidentified commercials that have been commonplace.

The Commercial Delivery specification is sponsored by AMWA principal member Ad-ID, a joint venture of the American Association of Advertising Agencies (4A's) and Association of National Advertisers (ANA), and establishes a baseline for improved Operations, Administration, and measurement of advertising assets across the myriad of current and emerging delivery platforms, which when fully deployed, will result in substantial financial gains and improvements in productivity that will flow back to all participants within the supply chain.

Source: Advanced Media Workflow Association

MPEG Transport Stream

The MPEG-2 standard is defined by ISO/IEC 13818 as "the generic coding of moving pictures and associated audio information." It combines lossy video compression and lossy audio compression to comply with bandwidth requirements. The basic structure of all MPEG compression systems is asymmetric because the encoder is always more sophisticated than the decoder.

MPEG encoders are always algorithmic. The better ones are also adaptive, using a feedback path. MPEG decoders are not adaptive and perform a fixed function. This works well for applications like broadcasting, where the number of expensive complex encoders is few and the number of simple inexpensive decoders is enormous.

The MPEG standard provides little information about how encoder processes and operation. Rather, MPEG-2 specifies how a decoder interprets metadata in a bit stream. The metadata tells the decoder the rate the video was encoded, defines the audio coding, and identifies channels and other vital stream information.

A decoder that successfully deciphers MPEG streams is called compliant. The beauty of MPEG is that it allows different encoder designs to evolve simultaneously. Generic low-cost and proprietary high-performance encoders and encoding schemes all work because they are all designed to communicate with the compliant decoder base.

Stream Structures
An MPEG-2 stream can be either an Elementary Stream (ES), a Packetized Elementary Stream (PES) or a Transport Stream (TS). The ES and PES begin with and are stored as files. Individual ESs are essentially endless because the length of an ES is as long as the program itself.

Starting with analog video and audio content, individual ESs are created by applying MPEG-2 compression algorithms to the source content in the MPEG-2 encoder. This process is typically called ingest. The encoder creates an individual compressed ES for each audio and video stream. An optimally functioning encoder will appear transparent when decoded in a set-top box and displayed on a professional video monitor.

A good ES depends on several factors, beginning with the quality of the original source material, and the care used in monitoring and controlling audio and video variables when material is ingested. The better the baseband signal, the better the quality of the digital file. Also influencing ES quality is the encoded stream bit rate and how well the encoder applies its MPEG-2 compression algorithms within the allowable bit rate.

MPEG-2 has two main compression components: intraframe spatial compression and interframe motion compression. Encoders use a variety of techniques, some proprietary, to maintain the maximum allowed bit rate while at the same time allocating bits to both compression components. This balancing act can sometimes be unsuccessful. It is a tradeoff between allocating bits for detail in a single frame and bits to represent frame to frame motion changes.

Researchers are still investigating what constitutes a good picture. Presently, there is no direct correlation between the data in the ES and subjective picture quality. For now, the best way of checking encoding quality is with the human eye, after decoding.

The Packetized ES
Each ES is broken into variable-length packets. The result is a PES containing a header and payload bytes. The header includes information about the encoding process required by the MPEG decoder to decompress the ES.

Each individual ES results in an individual PES. At this point, audio and video information still resides in separate PESs. The PES is primarily a logical construct and is not actually intended to be used for interchange, transport and interoperability. The PES also serves as a common conversion point between TSs and PSs.

Both the TS and PS are formed by packetizing PES files. During the formation of the TS, additional packets, containing tables needed to demultiplex the TS, are inserted. These tables are collectively called PSI and will be addressed in detail later.

Some packets contain timing information for their associated program, called the program clock reference (PCR). The PCR is inserted into one of the optional header fields of the TS packet. Recovery of the PCR allows the decoder to synchronize its clock to the rate of the original encoder clock.

Null packets, containing a dummy payload, may also be inserted to fill the intervals between information-bearing packets.

TS packets are fixed in length at 188 bytes with a minimum 4-byte header and a maximum 184-byte payload. Key fields in the minimum 4-byte header are the sync byte and the Packet ID (PID). The sync byte's function is indicated by its name. It is a long digital word used for defining the beginning of a TS packet.

The PID
The PID is a unique address identifier. Every video and audio stream as well as each PSI table needs a unique PID. The PID value is provisioned in the MPEG multiplexing equipment. Certain PID values are reserved or specified by organizations such as the Digital Video Broadcasting Group (DVB) and the Advanced Television Systems Committee (ATSC) for electronic program guides.

In order to reconstruct a program from all its video, audio and table components, it is necessary to ensure that the PID assignment is done correctly and that there is consistency between PSI table contents and the associated video and audio streams. This is one of the more critical points in a MPEG-2 stream.

There are four other important fields in the TS header. One is the continuity counter. It is a 4-bit field that repeatedly increments zero through 15 for each PID. It’s used to determine if packets are lost or repeated PCR. Second is the discontinuity indicator. It indicates a time base (PCR) and continuity counter discontinuity, which allows the decoder to handle such discontinuities. Third is the random access indicator. It indicates that the next PES packet in the PID stream contains a video-sequence header or the first byte of an audio frame. Fourth is the splice countdown. It indicates the number packets of the same PID number to the splice point when a new PES packet begins.

PSI
During the formation of the TS, additional packets, containing tables needed to demultiplex the TS, are inserted. These tables are collectively called PSI. PSI is part of the TS. PSI is a set of tables required for demultiplexing and sorting out which PIDs belong to which programs.

To identify which audio and video PIDs contain the content of a particular program, a Program Map Table (PMT) must be decoded. Each program requires its own PMT with a unique PID value.

In order to determine which PID contains the desired program's PMT, the Program Allocation Table (PAT) must be decoded. The PAT is the master PSI table with PID value always equal to zero (PID = 0). If the PAT cannot be found and decoded in the TS, then no programs can be found, decompressed, or viewed.

For a set-top box or ATSC tuner to successfully perform the program recovery and decompression process, the PSI tables must be sent periodically and with a fast enough repetition rate that it doesn’t delay channel-surfing viewers. Thus, checking the PSI tables for correct syntax and repetition rate is a vital part of MPEG testing.

Testing PSI involves verifying the accuracy and consistency of PSI contents. As programs change or multiplexer provisioning is modified, some problems may occur. One problem would be unreferenced PID. Packets with a PID value are present in the TS but are not referenced in any table.

If there are no packets with the PID value referred to in a PSI table present in the TS, the problem could be a missing PID.

Another useful PSI test is a check of program content. Just because there are no unreferenced or missing PIDs indicated does not mean that the viewer is receiving the correct program. There may also be a mismatch of the audio content from one program being delivered with the video content from another program. Because MPEG allows multiple audio channels for multiple languages, an air-check can ensure that viewers are receiving the correct language.

It is possible to use a set-top box and television to do the air check, but a better way would be to use an MPEG test set that incorporates all the PSI table checks plus a built-in decompressor with picture and audio display. This would allow you to correlate PSI contents and actual program content as well as allow a quick visual and aural check of ES.

By Ned Soseman, Broadcast Engineering