3D CineCast: Adaptive Streaming

Showing posts with label Adaptive Streaming. Show all posts

How to Create ABR Content with FFmpeg in One Pass

An interesting article

How to Encode Video for HLS Delivery

If you need to deliver to mobile devices and via OTT platforms, you need to deliver HTTP Live Streaming (HLS). Apple provides plenty of advice for compressionists, but here are some tips and tricks for encoding and testing your HLS files.

By Jan Ozer, StreamingMedia.com

Content Replacement: New Protocols Enable Flexibility in Simulcast Services

Until recently, simulcast streaming to connected devices was performed using protocols like RTMP, RTSP and MMS. In 2009, when the iPhone 3GS was launched, iOS 3.0 included a new streaming protocol called HTTP Live Streaming (HLS), part of a new class of video delivery protocol.

HLS differed from its predecessors by relying only upon HTTP to carry video and flow control data to the device. It made the protocol far more firewall-friendly and easier to scale, as it required no specialist streaming server technology distributed throughout the Internet to deliver streams to end users. The regular HTTP caching proxies that serve as the backbone of all Content Delivery Networks (CDNs) would suffice.

Apple was not alone in making this paradigm switch. Microsoft and Adobe also introduced their own protocols — SmoothStreaming and HDS, respectively. Today, work is ongoing to standardize these approaches into a single unified protocol, under a framework known as MPEG-DASH.

What is significant about all these is that they separate the control aspects of the protocol from the video data. They share the general concept that video data is encoded into chunks and placed onto an origin server or a CDN. To start a streaming session, client devices load a manifest file from that server that tells them what chunks to load and in what order. The infrastructure that serves the manifest can be completely separate from the infrastructure that serves the chunks.

The separation of these concerns provides a basis for dynamic content replacement, as it is possible to dynamically manipulate the manifest file to point the client device at an alternative sequence of video chunks that have been pre-encoded and placed on the CDN. The ability to swap chunks out in this way relies on the encoding workflow generating video chunks whose boundaries match possible replacement events.

Stream Conditioning and ESAM
Multi-screen encoding workflows must deal with encoding the video, as well as packaging it for delivery into the protocols required by devices. Stream conditioning for dynamic content replacement is about ensuring that the encoding workflow knows when events at which replacement could occur, and ensuring that the video is processed correctly. It is important to emphasize that the replacement does not happen at this point: It is done closer to the end user.

When the encoder is informed about a splice point, it starts a new group of pictures (GOP), and when this GOP is encountered downstream by the packager, a new video chunk is created, as shown in Figure 1. Broadcasters should be wary of how their encoder and packager handle edge cases, such as when a splice point comes just before or after where a natural GOP and video chunk boundary would have been, so that extremely small video chunks and GOPs are avoided.

When the encoder is informed about a splice point, it starts a new group of pictures (GOP), and when this GOP is encountered downstream by the packager, a new video chunk is created.

Splice points can be signaled to the encoding workflow in-band or out-of-band. More and more multi-screen encoders are capable of handling SCTE-35 messages within an input MPEG TS to determine splice points. Most multi-screen encoders that support SCTE-35 handling also have either a proprietary HTTP-based API or support SCTE-104 for out-of-band splice-point signalling.

There has been a clear need to standardize stream conditioning workflows to allow interoperability between systems deployed for that purpose.

The Event Signaling and Management (ESAM) API specification — a new specification that emerged from the CableLabs Alternate Content working group — describes the relationship and messages that can be passed between a Placement Opportunity Information Server (POIS) and an encoder or packager. The POIS is responsible for identifying and managing the splice points, using the ESAM API supported by the encoder and packager to direct their operations.

The specification defines how both encoder and packager should converse with the POIS, but not how the POIS operates or how it decides on when the splice points should appear. This is considered implementation-specific, but could, for example, use live data from the broadcast automation system to instruct the encoder and packager. ESAM also permits hybrid workflows where the splice points are signaled in-band with SCTE-35 and then decorated with additional properties (including their duration) by the POIS server from out-of-band data sources.

The ESAM specification is relatively speaking brand new, but it is gathering support from encoder/packager vendors. Broadcasters building multi-screen encoding workflows today, even if dynamic content replacement is required initially, should ensure that an upgrade path to ESAM is available in their chosen vendor’s roadmap to ensure future-proofing.

Today, video takes many paths to end-users’ devices. The overview above shows some of the technology that allows broadcasters to tailor their video content to their users. One method is content replacement.

User-centric Ad Insertion at the Edge
On the output side of the encoding workflow, video chunks are placed onto the CDN, and a cloud-based service responsible for performing dynamic content replacement receives the manifest file. This means that the actual replacement of content — whether it is ad insertion or content occlusion for rights purposes — is performed, in network topology terms, close to the client device.

The mechanism, as described earlier, relies on the relatively lightweight manipulation of the part of the video delivery protocol that tells the client where to fetch the video chunks. This can be performed efficiently on such a scale as to permit decisions to be made for each individual user accessing the live stream.

By performing the content replacement in the network, the client simply follows the video segments laid out to it by the content replacement service, and the transitions between live and replacement content are completely seamless. That makes it a broadcast experience — a seamless succession of content, advertising, promos and so on, with no freezes, blacks or buffering — but with the potential for user-centric addressability.

Of course, the content replacement policy and user tracking through the integration of this content replacement service lies with the broadcaster’s choice of ad servers and rights management servers. SCTE-130 defines a series of interfaces which include the necessary interface between a content replacement service and an Ad Decision Service (ADS). In the Web world, Video Ad Serving Template (VAST) and Video Player Ad Interface Definition (VPAID) have emerged as generally analogous specifications.

The ability to tailor content down to the individual, by replacing material in stream, in simulcast, while retaining the broadcast-quality experience of seamless content, is a totally new concept. The commercial ecosystem that generates the need for focused ad targeting must now catch up with the technology that supports it.

David Springall, Broadcast Engineering

Bridgetech Introduces Digital Media Monitoring on the iPhone

Bridge Technologies has launched PocketProbe, an iPhone app that enables objective analysis of real network performance of streaming media, in a simple to use, easy to understand tool that technical staff can carry anywhere.

Available now from Apple’s App Store, PocketProbe extends both the existing capabilities of digital media monitoring systems built from Bridge Technologies hardware probes, and the monitoring software environment.

Already providing the most comprehensive end-to-end monitoring and analysis capability, with a range of fixed and portable probes, the system now extends right into the engineer’s pocket.

PocketProbe contains the same OTT Engine found in the company’s VB1, VB2 and 10G VB3 series digital media monitoring probes, enabling confidence validation and analysis of http variable bit-rate streams from any location.

PocketProbe is available in two versions: the free application can validate five HLS streams in round-robin mode, provide analysis and manifest consistency alarms, play back media in the various profile bit-rates, and graphically display the actual chunk download patterns and bit-rates.
The full version also offers the ability to validate HDS and SmoothStream manifest files and store twenty-five streams with all profiles.

PocketProbe is easy to use, with a fully automatic set up: once the stream URL is input, the app finds all related profiles and validates the consistency. Since the PocketProbe uses exactly the same metric as in the hardware probes, the PocketProbe can be used by service engineers and operational staff to test real world behaviors post-cloud with various operators.

Accurate status of bit-rates used and profile changes is displayed in realtime, giving instant understanding of provider delivery capability. Together with hardware probes used pre-cloud, the post-cloud location of the PocketProbe enables excellent correlative understanding of CDN and provider abilities.

Source: TVB Europe

How YouTube is Bringing Adaptive Streaming to Mobile, TVs

Have you ever played with the settings of a YouTube video to make it look better? YouTube Mobile and TV engineering head Andy Berkheimer would like you stop doing that.

Berkheimer headed a project last year that brought adaptive bitrate streaming to the YouTube desktop player, enabling the player to automatically switch between different video quality settings based on your internet connection speed, among other factors.

Now he is bringing the same technology to mobile devices and TVs. “We are making it work just as it should,” Berkheimer told me during an interview this week.

From 240p to 4K
That may sound simple, but optimizing video playback has been a long journey for the Google-owned video site. Berkheimer joined YouTube six years ago, when there was just one default video quality — 320×240, also known as 240p. “That was really, really grainy video,” recalled Berkheimer.

His team used Google’s cloud infrastructure to allow for additional codecs, bringing HD and eventually even 4k to the site. But with higher bitrates, buffering also became more of a problem.

The solution? Adaptive bitrate streaming, which is industry-speak for switching the quality of a video in midstream, without the need to re-buffer and start over. YouTube started switching from progressive downloads to adaptive bitrate streaming in its desktop player a year ago, and completed the process late last year.

The new player is keeping close eyes on the speed and health of your internet connection, explained Berkheimer: “It’s continuously monitoring the bandwidth and the throughput it is seeing,” he said, adding that it also keeps tabs on the size of your player.

Are you watching a video in full screen? Then you can expect YouTube to send you more bits, as long as your connection is fast enough.

YouTube’s Take on Adaptive Streaming
Adaptive streaming isn’t new: Companies like Netflix and Hulu have used the technology for some time to optimize their streaming experience. But YouTube had some unique challenges to solve when it rolled out its own implementation.

For example, Netflix often starts with a lower-bitrate stream and then slowly scales up, which is why it can take a minute or so before full HD quality sets in.

That approach doesn’t really work for YouTube videos that only last a minute or two. YouTube tends to be more aggressive in sending out higher-quality video, and then scales down the video if necessary, Berkheimer explained. The site also makes use of the fact that you often watch more than one YouTube video in a row, and optimizes your bit rate across an entire session.

The results of these efforts have been encouraging. YouTube has seen buffering reduced by 20 percent since it launched adaptive streaming for its desktop player. That’s why the company is now taking the technology to TVs and mobile devices.

Next Up: Mobile and TVs
Of course, TVs require a lot more HD video, and buffering becomes even more obvious when you compare it to the nonstop experience of a traditional broadcast. Berkheimer told me that YouTube is working with the majority of the TV industry to bring adaptive streaming to TV sets, and that virtually all models introduced at CES this year already support the technology. The company is also working to bring adaptive streaming of YouTube videos to game consoles.

Mobile, on the other hand, comes with different challenges, as people move in and out of the reach of cell towers while they get their video fix on public transport.

And then there is this: “One of the biggest challenges we have is the global nature of YouTube,” said Berkheimer. Average mobile internet speeds are much slower in India and Brazil than in the U.S. and Europe, but videos still have to play without long and tiresome buffering. Broadband in Canada on the other hand is fast, but tightly rationed, with major ISPs charging their customers extra if they go over their caps.

That’s also one reason that those settings that allow you to manually change the bitrate of a YouTube video haven’t disappeared from the player yet — even though Berkheimer would very much like them gone. He told me that there have been some passionate discussions within the company about these manual settings. The result? For now, they’re staying.

But Berkheimer and his team are still working hard so that you can completely ignore them. “The most rewarding thing is that users don’t have to think about it,” he said.

By Janko Roettgers, GigaOM

Two Worlds Collide: Smooth Streaming Meets Flash Player

Microsoft today announced that it is launching a preview version of a Smooth Streaming plugin for the Open Source Media Framework (OSMF) player. Developers can use Smooth Streaming capabilities in any OSMF-compliant player, as well as Adobe's own Strobe player.

"We are pleased to announce that Windows Azure Media Services team released a preview of Microsoft Smooth Streaming plugin for OSMF," wrote Cenk Dingiloglu, a program manager on the Windows Azure Media Services team, in a Microsoft IIS blog posting. He also provided a link, for developers who want to integrate the plugin, to a set of documents and licensing requirements.

In a series of meetings last Thursday on the Microsoft campus in Redmond, Washington, the Windows Azure Media Services team laid out their strategy on a number of fronts, including the extension of Smooth Streaming client software development kits (SDKs) to embedded devices, iOS devices, and player frameworks.

During one of those Microsoft-sponsored meetings, hosted by Microsoft senior technical evangelist Alex Zambelli, Dingiloglu and Mike Downey discussed the most recent addition of OSMF support, noting that Smooth Streaming shares similarities when it comes to codecs and the use of the fragmented MP4 file.

"Support for the same audio and video codecs, H.264 and AAC, respectively," said Dingiloglu, "provides the opportunity to use fMP4, leveraging the best of both the OSMF framework and the Smooth Streaming Client SDK."

The Smooth Streaming plugin will provide some key features of Smooth Streaming, such as on-demand functionality (play, pause, seek, stop), but will also use OSMF built-in API hooks to support two key features: multiple audio language switching and maximum playback quality selection.

OSMF supports late binding, based on its use of fMP4, allowing multiple languages to be accessible to the end user without requiring all possible languages' audio tracks to be multiplexed together into a single Transport Stream, the way that iOS devices require.

OSMF and a Strobe player support also provides Microsoft a way onto the Android OS platform, too, making it possible for Smooth Streaming content to reach Android-powered smartphones and tablets.

"You can build rich media experiences for Adobe Flash Player endpoints using the same back-end infrastructure you use today to target Smooth Streaming playback to other devices like Win8 store apps, browser and so on," Dingiloglu wrote in the IIS blog post.

Microsoft isn't claiming the new OSMF plugin is ready for prime time quite yet, but I was able to see a working version of Smooth Streaming within an OSMF player during last week's visit.

In fact, one of the more impressive demonstrations was that of a playlist/manifest file that contained both Adobe .f4v files and Microsoft .ism files. The OSMF player seamlessly switched between the two fMP4 file formats, allowing content owners to intermix content from either format for playback.

"As this is a preview release, you're likely to hit issues, have feature requests, or want to provide general feedback," wrote Dingiloglu. "We want to hear it all! Please use the Smooth Streaming plugin for OSMF forum thread to let us know what's working, what isn't, and how we can improve your Smooth Streaming development experience for OSMF applications."

All of this raises the question around Smooth Streaming as it relates to MPEG DASH, the ratified dynamic adaptive streaming standard. Like Adobe, which noted it will continue to develop its own HTTP Dynamic Streaming (HDS) flavor of HTTP-delivered adaptive bitrate streaming, Microsoft sees a benefit in continuing to push the envelope with Smooth Streaming.

The company made it clear that it fully supports DASH, and yet it sees Smooth Streaming as a test bed in which it can continue to innovate for major events like the Olympic Games, which served as a catalyst - over the past three Games - for a number of innovations that now find their way into both Windows Azure Media Services and DASH.

The Smooth Streaming plugin requires browsers supporting Flash Player 10.2 or higher and also requires OSMF 2.0. Microsoft provides licensing details for the Smooth Streaming plugin for interested developers.

By Tim Siglin, StreamingMedia

Content Preparation for Adaptive-Bit-Rate Video

Today’s media landscape is radically more diverse than just a few years ago. The delivery of consistently acceptable image and sound quality is taken for granted by viewers, despite uncertain or fluctuating bandwidth. Adaptive-Bit-Rate (ABR) streaming technology makes this possible.

What is ABR Streaming?
ABR streaming is a delivery technology designed to provide consistent, high-quality viewing in situations where bandwidth may fluctuate, and where viewers may be on a wide range of devices.

Prior to ABR streaming, Web or mobile video delivery was typically done by encoding a single downloadable file or stream at a fixed bit rate and frame size. Viewers could buffer some of the video, and then simultaneously download and play it back. This delivery model was similar to cable transmission, where a single bit rate is transmitted over a reliable medium.

Unfortunately, transmission mediums for Web and mobile devices are unreliable, and bandwidths vary. During fixed-rate video playback, viewers with low bandwidth suffer from excessive buffering (delaying playback). To compensate, providers have tended to encode at lower bit rates, punishing viewers with high bandwidth. Even then, any fluctuations in bandwidth can cause buffering delays.

To solve this problem, ABR streaming content is encoded into multiple layers, each potentially a different bit rate, frame size and/or frame rate. These layers are combined into a single package that represents the original content. ABR players switch between layers depending upon the device and available bandwidth, to ensure consistent high-quality playback.

For example, a single ABR package might include six layers, each encoded at progressively higher bit rates. As a viewer watches content on his/her mobile phone during a train ride, the player will adaptively switch between low bit rates and high bit rates, depending upon the connectivity of the device.

How Does it Work?
Most ABR streaming technologies use standard Web protocol (HTTP delivery) to send video. This offers advantages over specialized streaming protocols such as RTSP or RTP, as HTTP-based delivery works immediately on Internet networks and can take advantage of edge technologies designed to cache HTTP requests.

During playback, video and audio are delivered via HTTP in small fragments, each representing some small amount of video, typically between 2 and 10 seconds in length. Each content package includes multiple layers, and each layer may include many fragments. For example, an hour-long movie may have 12 layers, each with a thousand fragments. The player is provided with a package manifest file outlining which layers are available and the location of the fragments for each layer.

During playback, the player requests and downloads a fragment from a layer. While the fragment is played, the connection speed is monitored, and the player may opt to switch layers, either increasing or decreasing the video bit rate based upon the connection speed. Players may also choose layers with different frame sizes or frame rates to optimize the visual experience for the device. This adaptive behavior is what ensures consistent playback regardless of connection speed or device.

There are several different ABR streaming technologies available: Apple HTTP Live Streaming (HLS), Adobe HTTP Dynamic Streaming (HDS), Microsoft Smooth Streaming (MSS), and more recently MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH). Each technology requires a complete ecosystem. The content must be prepared correctly, and the correct player must be used. All of the technologies work fundamentally in the same manner, using HTTP for content delivery in fragments.

Where these technologies differ is largely related to the structure of the underlying packages. For example, HLS for older versions of iOS requires a separate file for each video fragment. In contrast, most other packages store fragments for a layer in a single file, allowing the player to download fragments using HTTP byte range requests, which download a small part of a larger file.

Other differences in ABR technology relate to the viewer experience. Apple HLS, for example, provides for a dedicated key frame layer, allowing users to scrub through the video quickly. Other packages allow an audio-only stream with a poster image for extreme low-bit-rate situations.

Preparing Content
Preparing ABR content takes several steps. First, the desired packaging and layer structures need to be identified. Next, content must be encoded, checked for quality, packaged, encrypted and delivered.

ABR production workflow

Choosing Packaging and Profiles
Packaging choice is generally driven by what devices must be supported. Not every device supports players for every type of ABR streaming technology. As a result, one should catalogue both the devices and the players that will be supported. The necessary packaging will naturally become apparent as a result.

The selection of optimal bit rates, frame sizes and frame resolutions will vary depending upon device types, connection types and encoding technology. Apple and Adobe provide excellent starting points with suggested profiles suitable for their ecosystems. However, practically speaking, the entire catalogue of devices, expected network connections and network costs must be considered when designing layers.

With these considerations, layer design is a balancing act between frame size, bit rate and quality. However, the actual encoding technology used may have the biggest effect upon quality. For example, one study performed by the MSU Graphics & Media Lab showed that the use of x264 encoding technology saved necessary bit rates by as much as 50 percent compared to other H.264 encoding technologies at the same quality level. As a result, it is recommended that layers be designed while performing actual encoding tests with the final encoding technology.

Most packages, however, generally contain between 16 and 24 layers. Part of layer design will require a reduction in the number of layers. It is best to select a few common native display frame sizes (such as 1080p) and then encode multiple bit rates to those frame sizes. Doing so will avoid unnecessary performance degradation on players that use software scaling (particularly important for Adobe HDS).

Encoding, Packaging, Delivery and DRM
Each layer will require that a complete H.264 stream be encoded. With 16 to 24 layers, encoding an ABR package can easily require 20 times the processing power needed for a single H.264 stream. Fortunately, highly parallelized multirate H.264 encoding technology exists that re-uses information across the different streams. When combined with GPU acceleration, today’s encoding systems can offer 10 or 20 times the speed of CPU-based systems.

When preparing for multiple devices, an important aspect of encoding is transmuxing, the ability to re-use encoded H.264 streams across multiple package types. This prevents having to re-encode the same bit rates simply to package the video differently.

With on-demand content, it is important to perform QC checks on the different video streams. QC may be performed visually or by using automated tools that measure quality across all of the streams.

On-demand content often requires user authentication and protection prior to playback, which requires Digital Rights Management (DRM). When using DRM, the video must be encrypted during packaging, typically using AES 128-bit encryption. DRM systems typically have subtle requirements for how the encryption is performed by the encoder or packager, and it is important to validate that the two are compatible.

Finally, content delivery will be performed, either as a compressed TAR file or in the native package form. Where possible, it is recommended that the entire production process (ingest, encoding, transmuxing, packaging, quality control, encryption and delivery) be combined into a single automated workflow. Manual steps will significantly slow production time and may result in errors. It is also recommended that the ABR production process be combined with non-ABR production into a single automated system. This reduces system maintenance costs, offers a single view into the overall content production for all distribution channels and allows workflow efficiencies such as unified metadata preparation and content preprocessing.

Conclusion
Preparing video for ABR streaming generally requires research up front to choose technologies and encoding profiles, and a well-integrated, accelerated encoding approach to ensure workflow efficiency. With today’s tools, it is possible to fully automate the ABR content production workflow with full integration into existing content preparation and delivery workflows.

By John Pallett, Broadcast Engineering

Internet TV Systems and Coding

Today's TV viewers want more content from an increasing number of sources, and that means that Internet delivery is a growing phenomenon. With hybrid technologies emerging, it is reasonable to expect that television broadcast will increasingly use the Internet to expand throughput beyond that afforded by a single RF channel. But there are limitations to the Internet that must be understood in order to capitalize on this commodity, and some of those constraints are being overcome by new technologies.

Streaming Can Now Provide a High Quality of Service
In general, Internet TV is a means to provide streamed video content to a PC, STB or Internet-connected TV, by means of an Internet connection. Internet Protocol Television, or IPTV, refers to a special case where a full-time TV subscriber connection is established by means of a dedicated line (and channel) to the telephone system central office. It is envisioned, however, that many Internet TV viewers will get their content though their Internet connection, and as such, receive OTT video service that shares bandwidth with other Internet traffic.

This sharing of bandwidth creates a QoS challenge for Internet TV service: While a terrestrial channel has a fixed bandwidth (i.e., 19.2Mb/s in the U.S.), an Internet TV service must share the bandwidth, both locally (e.g., within a viewer's household) as well as regionally (e.g., with other subscribers). This means the bandwidth available to a receiver can vary continuously over a wide range, and different subscribers may have different levels of guaranteed service, as well. Lowering the video bit rate to the least common denominator would result in poor video quality to everyone; to deal with this, several technologies are available.

Progressive Download vs. Streaming
The simplest way to deliver video over the Internet is to use progressive download, sometimes called “HTTP streaming.” This is simply a bulk download of a video file to the viewer's terminal (i.e., Internet-connected TV, STB, PC, etc.). A temporary copy of the file is stored on the user's device, typically on a hard drive, and playback can start after a sufficient amount of the file has been downloaded. This means that content will always incur a considerable delay before it is available to be viewed, which makes a live service rather difficult to implement. However, because the files are downloaded using TCP, there can be a nearly 100 percent assurance that every single bit was transferred correctly.

True streaming, on the other hand, opens up a handshaking connection between the server and client using a set of Internet protocols to deliver streams, such as Real Time Streaming Protocol (RTSP), Real Time Messaging Protocol (RTMP) and Microsoft Media Services (MMS). A streaming connection delivers a video stream with minimal buffering, allowing a nearly real-time presentation of the source content. In this respect, streaming has an advantage over progressive download, as continuous delivery is the goal, but the associated downside is that corrupted or missing packets are not detected. The consequence is that audio and video can have ongoing glitches when network congestion is experienced.

Adaptive Bit Rate Streaming
To solve the QoS issue, Adaptive Bit Rate (ABR) streaming has been developed. ABR allows each device to determine the quality of its connection and then use that metric to select the best-coded stream from a number of different quality streams. At the server end, a series of encoders encode a set of multiple streams at different bit rates, and these streams are then sliced up into segments or “chunks.” An ABR client in the viewer terminal detects the incoming stream bandwidth on the fly and uses this, along with a model based on the device's CPU capability, to select a segment among the various streams.

A special manifest file precedes the first segment, providing the client with a list of URLs from which each segment can be accessed. As each segment is received, the client progresses to the next segment in that stream, or it can jump to a parallel segment in one of the other streams if the channel bandwidth changes because of congestion, etc. In principle, a handful of streams will provide enough granularity so that the viewer does not detect a change in picture quality.

Note that ABR provides high transmission bandwidth efficiency when a unicast transmission (i.e., one-to-one) is used, but it can also work well with multicast and broadcast scenarios depending on how well the Internet infrastructure distributes bandwidth to users. ABR has the potential to deliver an audio/visual experience that we have come to expect from linear transmission: low delay, fast start time and a consistent experience across viewers.

Several manufacturers have developed different solutions for ABR streaming. Adobe HTTP Dynamic Streaming (HDS) uses a format called F4F to deliver Flash videos over RTMP and HTTP. Apple HTTP adaptive Live Streaming (HLS) was developed for the iPhone and iPad, and is implemented using HTTP, H.264 and MPEG-2 Transport Streams, with a manifest file called M3U8. Microsoft Internet Information Services (IIS) Smooth Streaming is used within Silverlight on the Windows 7 phone and incorporates fragmented MP4 (fMP4) encapsulation, again with H.264 for video compression.

With these different enterprise systems, an interoperability problem exists because of proprietary protocols and manifest structures. Multiple ABR systems mean that different devices must either pick and choose which systems to support, leading to service-constrained devices, or must include all at increased cost. This situation has motivated companies and experts around the world to propose a single, standard ABR system.

DASH-ing to the Rescue: a Universal ABR System
MPEG-DASH (Dynamic Adaptive Streaming over HTTP) is a newly standardized method for defining Stream Segments and Manifest Files for the purpose of ABR streaming. The specification (ISO-IEC 23009-1) defines a Media Presentation Description (MPD) that formalizes the stream manifest, which includes Segment timing, URLs and media characteristics such as video resolution and bit rates. While compatible Segments can contain any media data — with arbitrary compression — two types of containers are exemplified in the standard: MPEG-4 file format and MPEG-2 Transport Stream.

MPEG-DASH defines a standard set of Media Presentation Description and Segment Formats
that enable adaptive bit rate IP video streaming.

In going to a standard system, MPEG-DASH is quickly deployable with the existing Internet infrastructure, using widely deployed standard HTTP servers/caches for scalable delivery. Generic encoders can be reused, with additional descriptive metadata for better client functionality, and legacy manifest files can be converted easily to MPD format, as well as sent in parallel for backward compatibility with low overhead. In addition, existing content and production equipment supporting legacy ABR streaming systems can be used for MPEG-DASH by means of a set of standard Profiles. Apple HLS content can be used with the DASH M2TS Main profile, and Microsoft IIS Smooth Streaming Content is suitable for DASH ISO-BMFF (Base Media File Format) Live profile.

Vendors are now proposing integrated workflow and delivery systems supporting ABR with multiple source formats, protocols and on multiple devices. While encoding latency can be an issue for live streams, MPEG-DASH includes a profile optimized for live encoding that can achieve a latency of a few seconds by encoding and immediate delivery of short Segments.

In addition to delivery of any multimedia content, MPEG-DASH supports a broad range of use cases, including live, VOD, time shifting (nPVR), ad insertion and dynamic update of program. MPEG-DASH also solves the problem of content repurposing to multiple devices with widely ranging capabilities. In principle, an MPEG-DASH-controlled stream can be targeted simultaneously to both large and small screens, as well as fixed and mobile.

Internet Quickly Becoming Viable for Long-Form Content
The once-exclusive realm of RF transmission as providing the highest quality content consumption experience is being challenged by streaming services. But new technologies and business models are providing broadcasters with the tools to compete with new service entities, and that's where content distribution is headed.

By Aldo Cugnini, Broadcast Engineering

HTTP ABR Streaming

Cisco Systems Visual Networking Index (VNI) predicts that more than 50 percent of all global Internet traffic will be attributed to video by the end of 2012. It also confirms, in addition to television screens, video delivery to cell phone and computer screens will be increasingly common Globally, Internet video traffic is projected to be 58 percent of all consumer Internet traffic in 2015, up from 40 percent in 2010. At that time, three trillion minutes of video content are projected to cross the Internet each month, up from 664 billion in 2010, when 16 percent of consumer internet video traffic in 2015 will be TV video. There is no doubt that if you are in the business of transmitting video, you will likely be using IP in the near future.

Delivering acceptable video quality over IP to TV viewers and other devices has led to a still-evolving delivery infrastructure. The required network scale has higher packet loss and error rates than smaller managed networks. Adaptive Bit Rate (ABR) delivery protocols like Apple's HLS and Microsoft's Silverlight, among others, help address these issues. These protocols use HTTP over TCP to mitigate data loss by dynamically adapting bit rates to adjust to networks that can provide only unpredictable instantaneous bandwidths.

Using a CDN to distribute the content to a range of servers located close to the viewers is another key feature to successful deployments to avoid the congestion and bottlenecks of centralized servers. Yet, despite more complex protocols to handle a range of transport issues, high-quality performance is not guaranteed. Cost-effective operations and a good viewer experience depend on good monitoring observability and targeted performance metrics for rapid problem identification, location and resolution.

ABR Protocols
ABR video delivery mechanisms over IP that enable this rapidly growing Internet video market are effective, but complex. Not only do they require the usual video compression encoders to achieve practical bit rates, but they also require a host of other devices and infrastructures, including segmenting servers, origin servers, a CDN and a last-mile delivery network.

ABR protocols help deliver a quality video experience to viewers by overcoming common IP data network performance issues such as packet arrival jitter, high loss rates, unpredictable bandwidth and security firewall issues. HTTP delivery solves most firewall issues as it is almost universally unblocked since it is also used for web browsing. HTTP, which uses TCP, assures loss-free payload delivery as well. While predictable instantaneous bandwidth levels are a challenge in unmanaged networks, by using variable encoding rates and these protocols, the viewer's client device can dynamically select the best stream bit rate for the instantaneously available bandwidth.

Apple's HTTP Live Streaming (HLS) is an example of a protocol that successfully navigates the challenges of unmanaged networks to transfer multimedia streams using HTTP. To play a stream, an HLS client first obtains the playlist file, which contains an ordered URI list of media files to be played. It then successively obtains each of the media files in the playlist. Each media file is, typically, a 10-second segment of the desired multimedia stream. A playlist file is simply a plain text file containing the locations of one or more media files that together make up the desired program.

The media file is a segment, or “chunk,” of the overall presentation. For HLS, it is always formatted as an ISO 13818 MPEG-2 TS or an MPEG-2 audio elementary stream. The content server divides the media stream into media files of approximately equal durations at packet and key frame boundaries to support effective decoding of individual media files. The server creates a URI for each media file that allows clients to obtain the file and creates the playlist file that lists the URIs in play order.

On the transmitting end, the adaptive encoder creates segments with fixed duration at different bit rates and an index file that acts as a playlist to indicate the sequence of the segments. On the receiving end, the adaptive protocol buffers video segments in the correct sequence, selecting the best quality possible for the bit rate available at each interval before playing them seamlessly.

Multiple playlist files are used to provide different encodings of the same presentation. A variant playlist file that lists each variant stream allows clients to dynamically switch between encodings. Each variant stream presents the same content, and each variant playlist file has the same target duration. If the playlist file obtained by the client is a variant playlist, the client can choose the media files from the variants as needed based on its own criteria, such as how much network bandwidth is currently available. The client will attempt to load media files in advance of when they will be required for uninterrupted playback to compensate for temporary variations in latency. The client must periodically reload the playlist file to get the newest available media file list, unless it receives a tag marking the end of the available media.

CDN Operation
Using HTTP client-driven streaming protocols like HLS effectively supports adaptive bit rates, handles high network error rates and firewall issues, and supports both on-demand and live streaming. However, with millions of clients establishing individual protocol sessions to receive video, scalability must be considered. Further challenging the system design are sudden spikes in requests from “flash crowds” or “SlashDot effects” that may be caused by current events where a sudden, unexpected demand overwhelms servers, and content becomes temporarily unavailable.

The CDN is a collection of network elements that replicates content to multiple servers to transparently deliver content to users. The elements are designed to maximize the use of bandwidth and network resources to provide scalable accessibility and maintain acceptable QoS. Particular content can be replicated as users request it or can be copied before requests are made by pushing the content to distributed servers closer to where it is anticipated users will be requesting it.

In either case, the viewer receives the content from a local server, relieving congestion on the origin server and minimizing the transmission bandwidth required across wide areas. Caching and/or replica servers located close to the viewer are also known as edge servers or surrogates. To realize the desired efficiencies, client requests must be transparently redirected to the optimal nearby edge server.

Content distribution and management strategies are critical in a CDN for efficient delivery and high-quality performance. The type and frequency of viewer requests and their location must dynamically drive the directory services that transparently steer the viewer to the optimum edge server, as well as the replication service, to assure that the requested content is available at that edge server for a timely response to the viewer. A simplistic approach is to replicate all content from the origin server to all surrogate servers, but this solution is not efficient or reasonable given the increase in the size of available content. Even though the cost of storage is decreasing, available edge server storage space is not assured. Updating this scale of copies is also unmanageable.

Practically, a combination of predicted and measured content popularity and on-demand algorithms are used for replication. Organizing and locating edge server clusters to maintain optimum content availability relies on policies and protocols to load balance the servers. Random, round robin or various weighted server selection policies, along with selections based on number of current connections, number of packets served, and/or server CPU utilization, health and capacity are all utilized and varied based on load persistence considerations.

Quality Assurance
Cost-effective operations rely on good monitoring observability and performance metrics for rapid problem identification and resolution. QoS performance monitoring metrics provide needed information about stream delivery quality, key information about the types of impairments and their causes, as well as warnings of impending impairments for ABR streaming networks. Combined with end-to-end monitoring, QoS monitoring used in production network monitors network delivery quality of the flows and for other applications such as system commissioning and tuning.

In adaptive streaming environments, QoS should be monitored post caching server and at the client. This chart shows how the VeriStream metric characterizes instantaneous network delivery quality on a 1-5 scale:

1 - Severe underrun: Interval between segments and the file transfer time are slower than the drain rate.
2 - Underrun: Segment interval is slower than the drain rate, but file transfer time is faster than the drain rate.
3 - Warning: Interval between segments and the file transfer time are marginal.
4 - Growing buffer: Interval between segments and the file transfer are faster than the drain rate.
5 - Balanced system: Interval between segments is balanced, and the file transfer is faster than the drain rate.

Such metrics are intended to analyze streams susceptible to IP network device and client/server impairments. For adaptive streaming environments, it is also important to monitor QoS at the client end point, which can be used to assess the dynamic performance of network and system delivery. QoS metrics for ABR must continuously analyze the dynamic delivery of stream segments.

Comprehensive monitoring in real time at strategic network locations for rapid problem detection and fault isolation can be combined with control plane and content quality monitoring for optimum system management.

Summary
Leaving the well-managed network domain of provider IP networks requires new adaptive bit-rate protocols that are rapidly proving their effectiveness. A comprehensive, end-to-end monitoring strategy gives content and service providers the streamobservability and fault-isolating capabilities needed for timely and efficient adaptive bit-rate network delivery deployments.

By James Welch, Broadcast Engineering

Netflix Sees Cost Savings in MPEG DASH Adoption

"The biggest advantage to us of a standard like MPEG DASH is that everything can be encoded one way and encapsulated one way, and stored on our CDN servers just once. That's a benefit both in terms of saving our CDN costs from a storage perspective and a benefit because you have greater cache efficiency," said Mark Watson, senior engineer for Netflix.

Watson made his comments in a red carpet interview at the recent Streaming Media West conference in Los Angeles, shortly before taking part in a panel on the MPEG DASH specification. MPEG DASH would be a great help to Netflix, he said, because then it could avoid saving several different copies of its entire movie and TV show library.

While there are several different profiles defined in MPEG DASH, Netflix will use the on-demand profile, Watson said, because all of its online content is on-demand. Between the two types of stream segments defined -- MPEG-2 Transport Streams and fragmented MP4 files -- Netflix sides with fragmented MP4. It works well for adaptive streaming and is simpler, he offered.

Netflix, Watson said, contracts with multiple CDNs and allows the client devices to determine which works best for them at any time. The company is also sensitive to the amount of traffic it's putting across networks.

Click to watch the video
By Troy Dreier, Streaming Media

The MPEG-DASH Standard for Multimedia Streaming Over the Internet

A white paper by Microsoft.

Watching Video Over the Web

Two interesting white papers by Cisco: part 1 and part 2.

Transcoding Strategies for Adaptive Streaming

An interesting white paper by ARRIS.

What is MPEG DASH?

MPEG DASH (Dynamic Adaptive Streaming over HTTP) is a developing ISO Standard (ISO/IEC 23009-1) that should be finalized by early 2012. As the name suggests, DASH is a standard for adaptive streaming over HTTP that has the potential to replace existing proprietary technologies like Microsoft Smooth Streaming, Adobe HTTP Dynamic Streaming (HDS), and Apple HTTP Live Streaming (HLS). A unified standard would be a boon to content publishers, who could produce one set of files that play on all DASH-compatible devices.

The DASH working group has industry support from a range of companies, with contributors including critical stakeholders like Apple, Adobe, Microsoft, Netflix, Qualcomm, and many others. However, while Microsoft has indicated that it will likely support the standard as soon as it’s finalized, Adobe and Apple have not given the same guidance, and until DASH is supported by these two major players, it will gain little traction in the market.

A more serious problem is that MPEG DASH doesn’t resolve the HTML5 codec issue. That is, DASH is codec agnostic, which means that it can be implemented in either H.264 or WebM. Since neither codec is universally supported by all HTML5 browsers, this may mean that DASH users will have to create multiple streams using multiple codecs, jacking up encoding, storage, and administrative costs.

Finally, at this point, it remains unclear whether DASH usage will be royalty-free. This may impact adaption by many potential users, including Mozilla, who has already commented that it’s “unlikely to implement” DASH unless and until it’s completely royalty-free. With Firefox currently sitting at around 22% of market share, this certainly dims DASH’s impact in the HTML5 market.

Introduction to MPEG DASH
Adaptive streaming involves producing several instances of a live or on-demand source file and making them available to various clients depending upon their delivery bandwidth and CPU processing power. By monitoring CPU utilization and/or buffer status, adaptive streaming technologies can change streams when necessary to ensure continuous playback or to improve the experience.

One key difference between adaptive streaming technologies is the streaming protocol utilized. For example, Adobe’s RTMP-based Dynamic Streaming uses Adobe’s proprietary Real Time Messaging Protocol (RTMP), which requires a streaming server and a near-continuous connection between the server and player. Requiring a streaming server can increase implementation cost, while RTMP-based packets can be blocked by firewalls[.

A near-continuous connection means that RTMP can’t take advantage of caching on plain-vanilla servers like those used for Hypertext Transfer Protocol (HTTP) delivery, the delivery protocol used by Apple’s HTTP Live Streaming (HLS), Microsoft’s Smooth Streaming, and Adobe’s HTTP-based Dynamic Streaming (HDS). All three of these delivery solutions use standard HTTP web servers to deliver streaming content, obviating the need for a streaming server.

In addition, HTTP packets are firewall friendly and can utilize HTTP caching mechanisms on the web. This latter capability should both decrease total bandwidth costs associated with delivering the video, since more data can be served from web-based caches rather than the origin server, and improve quality of service, since cached data is generally closer to the viewer and more easily retrievable.

While most of the video streamed over the web today is still delivered via RTMP, an increasing number of companies will convert to HTTP delivery over time.

All HTTP-based adaptive streaming technologies use a combination of encoded media files and manifest files that identify alternative streams and their respective URLs. The respective players monitor buffer status (HLS) and CPU utilization (Smooth Streaming and HTTP Dynamic Streaming) and change streams as necessary, locating the alternate stream from the URLs specified in the manifest file.

HLS uses MPEG-2 Transport Stream (M2TS) segments, stored as thousands of tiny M2TS files, while Smooth Streaming and HDS use time-code to find the necessary fragment of the appropriate MP4 elementary streams.

DASH is an attempt to combine the best features of all HTTP-based adaptive streaming technologies into a standard that can be utilized from mobile to OTT devices.

MPEG DASH Technology Overview
As mentioned, all HTTP-based adaptive streaming technologies have two components: the encoded A/V streams themselves and manifest files that identify the streams for the player and contain their URL addresses. For DASH, the actual A/V streams are called the Media Presentation, while the manifest file is called the Media Presentation Description.

As you can see in Figure 1, the Media Presentation is a collection of structured audio/video content that incorporates periods, adaptation sets, representations, and segments.

Figure 1. The Media Presentation Data Model

The Media Presentation defines the video sequence with one or more consecutive periods that break up the video from start to finish. Each period contains multiple adaptation sets that contain the content that comprises the audio/video experience. This content can be muxed, in which case there might be one adaptation set, or represented in elementary streams, as shown in Figure 1, enabling features like multiple language support for audio.

Each adaptation set contains multiple representations, each a single stream in the adaptive streaming experience. In the figure, Representation 1 is 640x480@500Kbps, while Representation 2 is 640x480@250Kbps.

Each representation is divided into media segments, essentially the chunks of data that all HTTP-based adaptive streaming technologies use. Data chunks can be presented in discrete files, as in HLS, or as byte ranges in a single media file. Presentation in a single file helps improve file administration and caching efficiency as compared to chunked technologies that can create hundreds of thousands of files for a single audio/video event.

The DASH manifest file, called the Media Presentation Description, is an XML file that identifies the various content components and the location of all alternative streams. This enables the DASH player to identify and start playback of the initial segments, switch between representations as necessary to adapt to changing CPU and buffer status, and change adaptation sets to respond to user input, like enabling/disabling subtitles or changing languages.

Other attributes of DASH include:

DASH is codec-independent, and will work with H.264, WebM and other codecs.
DASH supports both the ISO Base Media File Format (essentially the MP4 format) and MPEG-2 Transport Streams.
DASH does not specify a DRM method but supports all DRM techniques specified in ISO/IEC 23001-7: Common Encryption.
DASH supports trick modes for seeking, fast forwards and rewind.
DASH supports advertising insertion.

In terms of availability, DASH should be completed and ready for deployment in the first half of 2012.

MPEG DASH Intellectual Property Issues
At this point, it’s unclear whether DASH will be encumbered by royalties, and where they might be applied. For example, the MPEG-2 video codec comes with royalty obligations for encoders, decoders, and users of the codec. Many of the participants who are contributing intellectual property to the effort—including Microsoft, Cisco, and Qualcomm—have indicated that they want a royalty-free solution. While these three companies comprise the significant bulk of the IP contributed to the specification, not all contributors agree, so the royalty issue is unclear at this time.

Other issues include whether browser-vendor Mozilla can integrate DASH into their Firefox browser if the underlying media that a DASH MPD reference uses royalty-bearing components to play back. This is one of the key reasons that the company didn’t integrate H.264 playback into the Firefox browser in the past, along with the potential $5 million dollar per year royalty obligation.

We asked Mozilla about their intentions regarding DASH, and they sent this statement from Chris Blizzard, Director of Web Platform:

“Mozilla has always been committed to implementing widely adopted royalty-free standards. If the underlying MPEG standards were royalty free we would implement DASH. However, MPEG DASH is currently built on top of MPEG Transport Streams, which are not royalty free. Therefore, we are unlikely to implement at this time.”

According to website NetMarketShare, as of November 18, 2011, Firefox enjoyed a 22.5% market share in the desktop browser market. Without support from Firefox, DASH obviously doesn’t represent a standard that will unify the approach to adaptive streaming in the HTML5 market.

In addition, as a codec-agnostic technology, DASH also does nothing to resolve the HTML5 codec issue, so even if it was fully adopted by all HTML5-compatible browsers, content producers would still have to encode in both H.264 and WebM for universal playback.

Obviously, this doesn’t preclude DASH from being integrated into plug-ins like Flash or Silverlight or being implemented in mobile or OTT devices, and playing a significant role in these markets. However, as things exist today, it’s hard to see DASH as the cure-all solution for the current lack of live, adaptive streaming, and DRM support in desktop HTML5 browsers.

And, in the absence of affirmative statements from Apple or Adobe that they will adopt the standard once finalized, it’s unclear how much immediate traction DASH will gain in the mobile and plug-in markets. Let’s see why.

MPEG DASH Competitive Issues
To a great degree, DASH levels the playing field among competitive players in the adaptive streaming space. For example, Apple’s HLS provides a distinct competitive advantage over other mobile platforms as it’s a widely adapted specification that allows all connected iDevices to play adaptive streams. That’s why Google decided to implement HLS in Android 3.0. Distributing video to Apple iOS devices has been relatively straightforward because of HLS, while the lack of a technology standard and the diversity of devices has made distributing video to Android, Blackberry, and other mobile markets very challenging.

If Apple adopts DASH and implements it on all existing connected iDevices, this competitive advantage disappears, and all DASH-enabled mobile devices are on a level playing field respecting video playback. To be clear, Apple representatives have been active in creating the specification and there is no indication that they won’t support it when it’s released. However, none of the Apple representatives that we contacted were able to comment on Apple’s intent, which is not unusual given that Apple seldom comments on unreleased products. Still, Apple is not known for its competitive graciousness, and adapting DASH would clearly make their products less competitive vis a vis other mobile platforms, at least in the short term.

On the flip side, content publishers want a distribution mechanism with flexible and complete DRM, which iDevices don’t currently provide. If enough content producers support DASH-enabled platforms, but not iDevices, that will obviously motivate Apple to support the spec. However, unless and until Apple supports DASH, it’s unlikely that producers without DRM concerns will stop producing HLS streams, which may lesson the attractiveness of supporting DASH.

To a lesser degree, the same principle holds true for Adobe since the Flash Player’s ubiquity on the desktop is a key competitive advantage over Microsoft’s Silverlight and even HTML5. Though Adobe participated in the standards work, they haven’t committed to supporting DASH in future versions of the Flash Player. Again, Adobe seldom comments on future products, so you can’t draw any conclusion from their silence.

Conclusion
DASH is an extraordinarily attractive technology for web producers, a single standard that should allow them to encode once, and then securely distribute to a universe of players, from mobile to OTT, and to the desktop via plug-ins or HTML5. In addition to not resolving the HTML5 codec issue, it’s also unclear whether publishers will be charged for the privilege of producing files using the DASH spec, which could be a significant negative.

Mozilla has already indicated that they probably won’t support the specification as currently written, and Apple and Adobe have not affirmed if or when they will support the technology. An optimist would assume that the value of DASH to the streaming media marketplace would compel all stakeholders to make their contributes royalty free, and convince Apple, Adobe and Mozilla to support the specification soon after its release. Until all this plays out, though, DASH may play a significant role in some markets, but won’t reach its full potential.

By Jan Ozer, Streaming Media

MP4 File Fragmentation for Broadcast, Mobile and Web Delivery

Consistent multi-platform audio and video content delivery presents an ongoing challenge for broadcasters. Explosive smartphone and tablet growth on varying operating systems —Android, Apple iOS, or Windows Phone—threatens to create a user-experience divide between users on mobile devices, at the desktop or in the living room.

Broadcasters must address multi-platform consumption demands without compromising content security or network efficiencies. Many broadcasters are assessing efficiency of transport protocols used for content delivery, to see how they stack up for web and mobile delivery. Some legacy solutions, such as MPEG-2 Transport Stream (M2TS), lack basic web delivery functions.

What key information do broadcasters and network operators need to know as they look for more efficient approaches to the media delivery? This white paper explores fragmented MP4 files (fMP4) and considers whether the fMP4 format can replace legacy file formats.

Along the way, we’ll explore four key areas that impact both broadcasters and network operators:

Format benefits of fMP4
Network benefits of fMP4
Movement toward fMP4 standardization
Platforms supporting fMP4

By Timothy Siglin, Transitions, Inc.

What is HLS (HTTP Live Streaming)?

HTTP Live Streaming (or HLS) is an adaptive streaming protocol created by Apple to communicate with iOS and Apple TV devices and Macs running OSX in Snow Leopard or later. HLS can distribute both live and on-demand files and is the sole technology available for adaptively streaming to Apple devices, which is an increasingly important target segment to streaming publishers.

HLS is widely supported in streaming servers from vendors like Adobe, Microsoft, RealNetworks, and Wowza, as well as real time transmuxing functions in distribution platforms like those from Akamai. The popularity of iOS devices and this distribution-related technology support has also led to increased support on the player side, most notably from Google in Android 3.0.

In the Apple App Store, if you produce an app that delivers video longer than ten minutes or greater than 5MB of data, you must use HTTP Live Streaming, and provide at least one stream at 64Kbps or lower bandwidth. Any streaming publisher targeting iOS devices via a website or app should know the basics of HLS and how it’s implemented.

How HLS Works
At a high level, HLS works like all adaptive streaming technologies; you create multiple files for distribution to the player, which can adaptively change streams to optimize the playback experience. As an HTTP-based technology, no streaming server is required, so all the switching logic resides on the player.

To distribute to HLS clients, you encode the source into multiple files at different data rates and divide them into short chunks, usually between 5-10 seconds long. These are loaded onto an HTTP server along with a text-based manifest file with a .M3U8 extension that directs the player to additional manifest files for each of the encoded streams.

The player monitors changing bandwidth conditions. If these dictate a stream change, the player checks the original manifest file for the location of additional streams, and then the stream-specific manifest file for the URL of the next chunk of video data. Stream switching is generally seamless to the viewer.

HLS uses multiple encoded files with index files directing the player to different streams and chunks
of audio/video data within those streams.

HLS File Preparation
HLS currently supports H.264 video using the Baseline profile up to Level 3.0 for iPhone and iPod Touch clients and the Main profile Level 3.1 for the iPad 1 and 2. Audio can be HE-AAC or AAC-LC up to 48 KHz, stereo. The individual manifest files detail the profile used during encoding so the player will only select and retrieve compatible streams. This allows producers to create a single set of HLS files that will serve iPhone/iPod touch devices with Baseline streams and iPads with streams encoded using the Main profile.

Though encoded using the H.264 video codec and AAC audio codec, audio/video streams must be segmented into chunks in an MPEG-2 Transport Stream with a .ts extension. All files are then uploaded to an HTTP server for deployment. In a live scenario, the .ts chunks are continuously added and the .M3U8 manifest files continually updated with the locations of alternative streams and file chunks.

Before producing files for HLS, you should read through Apple’s Tech Note TN2224 which contains detailed recommended configurations (resolution, data rate, keyframe interval) for distributing both 4:3 and 16:9 video to all compatible iDevice and Apple TV players.

Content Protection and Closed Captions in HLS
HLS doesn’t natively support digital rights management (DRM) though you can encrypt the data and provide key access using HTTPS authentication. There are several third-party DRM solutions becoming available, including from AuthenTec, SecureMedia, and WideVine.

HLS can support closed captions included in the MPEG-2 Transport Stream.

Deploying HLS Streams
Delivery via HTTP has several advantages; no streaming server is required and the audio/video chunks should leverage HTTP caching servers located in the premises of internet service providers, cellular providers, and other organizations, which should improve video quality for viewers served from these caches. HTTP content should also pass through most firewalls.

Apple recommends using the HTML5 video tag for deploying HLS video on a website.

On the Playback Side
On computers and iPad devices, the Safari browser can play HLS streams within a web page, with Safari launching a full-screen media player on iPhones and iPod touch devices. Starting with version 2, all Apple TV devices include an HTTP Live Streaming client.

Producing HLS
As discussed, the HLS experience has two components: a set of chunked files in .ts format and the required manifest files. In an on-demand environment, you can encode the alternative files using any standalone H.264 encoding tool, with the latest version of Sorenson Squeeze offering a multiple file HLS encoding template. More recently, Telestream updated Episode to include command line HLS multiple file creation. Cloud encoding services like those provided by Encoding.com can also typically produce HLS-compatible files.

Once you have the encoded streams, you can use Apple tools to create the chunked files and playlists:

Media Stream Segmenter - Inputs an MPEG-2 Transport Stream and produces chunked .ts files and index files. It can also encrypt the media and produced encryption keys.
Media File Segmenter - Inputs H.264 files and produces chunked .ts files and index files. It can also encrypt the media and produced encryption keys.
Variant Playlist Creator - Compiles the individual index files created by the Media Stream or Media File Segmenter into a master .M3U8 file that identifies the alternate streams.
Metadata Tag Generator - Creates ID3 metadata tags that can either be written to a file or inserted into outgoing stream segments.
Media Stream Validator - Examines index files, stream alternates, and chunked .ts files to validate HLS compatibility.

For live HLS distribution, you need an encoding tool that can encode the files into H.264 format, create the MPEG-2 Transport Stream chunks and create and update the manifest files. When Apple first announced HLS in 2009, only two live encoders were available; one each from Inlet (now Cisco) and Envivio. Now most vendors of encoding hardware also offer live HLS-compatible products, including Digital Rapids, Elemental Technologies, Haivision, Seawell Networks, and ViewCast.

Real Time Transmuxing
The other approach to live or on-demand streaming to HLS-compatible players is via transmuxing, which is offered by multiple streaming server vendors and CDNs. Specifically, these servers input an H.264-stream originally compatible with Flash or Silverlight (or other formats) and then dynamically re-wrap the file into the required MPEG-2 Transport Stream chunks and create the required manifest files.

Server-based implementations include:

Adobe Flash Media Server 4.5
Wowza Media Server
Microsoft IIS Media Services
RealNetworks Helix Universal Server

Akamai also offers “in the network” repackaging of H.264 input files for HLS deployment.

In these applications, any live encoding tool that can deliver multiple streams of input to the server, like the Adobe Flash Live Media Encoder, Haivision, Microsoft Expression Encoder Pro, or Telestream Wirecast, can serve as the encoding front end for multiple-platform adaptive distribution including HLS.

Not surprisingly given the level of technology support, many of the larger online video platforms are now starting to support HLS distribution, including Brightcove, Kaltura, and Ooyala.

Conclusion
The iOS platform is a critical target for virtually all streaming publishers, and HLS can deliver the best possible experience to that platform, and others that support HLS playback. Fortunately, the streaming industry had embraced HLS with tools and technologies that make this very simple and affordable.

HLS Resources
One of the reasons that HLS has been so successful is that Apple has created multiple documents that comprehensively address the creation and deployment of HLS files.

You can also watch this video tutorial.

By Jan Ozer, StreamingMedia

Improving QoE for IP Video Services

The boom in OTT and TV Anywhere services is underlined by rapid growth in IP video transmission at all stages of the content lifecycle, and this is expanding greatly the scope and demand for Quality Assurance (QA) products. Even leading proponents of OTT services still admit there is some way to go to provide acceptable Quality of Experience (QoE) for high-definition premium content over unmanaged networks in particular.

“One of the main obstacles to OTT is the lack of a great user experience,” says Helge Høibraaten, CEO of Vimond Media Solutions, a spin-off of Norwegian commercial TV station TV 2, which is commercialising its OTT broadcast platform internationally.

Speaking at a conference during the recent IBC exhibition in Amsterdam, Høibraaten indicated that an OTT platform was defined by the quality it delivers and must meet the needs of all devices including tablets, PCs and smartphones. Vimond itself has only just extended its applications suite to Apple iOS devices (iPad and iPhone), Android and Windows phones, in addition to Windows desktop PCs which it already supported. The message for vendors of OTT platforms, and for the services that run on them, is that they should only embrace new device types when acceptable quality can be guaranteed.

The definition of acceptable quality is admittedly rather subjective. It is certain, though, that IP networks are creating new challenges for providers of QA video products. These vendors have been extending their portfolios to tackle video delivery over both managed and unmanaged IP networks, with various announcements made at IBC.

While unmanaged networks including the Internet pose the greatest challenge, even managed IP networks require careful handling to avoid packet loss and latency resulting from congestion within the infrastructure. This can happen because unlike traditional broadcast networks, IP infrastructures do not have fixed end-to-end paths and have no pre-determined transmission times for each IP packet. It is possible for more packets to enter the network than can be delivered within an acceptable time frame, leading to congestion and either dropped packets, delays, or both. Either of these can cause loss of quality on receiving devices.

The remedy is to apply traffic shaping, which involves holding up IP packets that are less critical or which can afford a little delay in order to preserve capacity for the most important packets. This can be performed at the point of entry to the network or within the network by routers themselves or other dedicated devices, and the key with managed networks is that operators can control the traffic shaping process better. Potentially, packet loss can be eliminated and latency kept within acceptable limits, according to Per Lindgren, VP Business Development and Co-Founder of Net Insight, the Swedish-owned vendor of the Nimbra IP media transport platform. Net Insight tackles the managed IP quality issue by breaking the network down into separate segments and applying QoE mechanisms including traffic shaping to each.

The first step is to ensure that the routers themselves do not create problems under congestion by dropping packets as they pass through, so Net Insight has applied traffic shaping at this level to ensure this does not happen. “By traffic shaping even inside our MSRs (Media Switch Routers), we can traffic shape down until we ensure we do not lose any packets there,” says Lindgren.

The next step is to address the links through the core network between the routers and ensure that the QoS needs of each individual service are met. “Traditionally telcos have not been treating media traffic as a special service,” says Lindgren. “So we propose building service aware media networks. MSRs aggregate traffic so that the core network (provided by a telco) only handles aggregated flows rather than individual services. Our MSRs then handle the different protection needs of each service, and can add QoS enhanced links inside a media service network rather than just at the edges.”

In this way, by addressing both the routers and links between them separately as part of a coordinated traffic management approach, the network can achieve much higher levels of quality. Even then, though, the possibility of packet loss or delay cannot be discounted, and so the third element of Net Insight’s QA strategy is to monitor every link. “We can do continuous real-time monitoring of traffic between MSRs and see any packet loss sent between one MSR and another,” Lindgren explains. “That makes it much easier to troubleshoot.”

Within unmanaged IP networks, on the other hand, it is impossible for broadcasters or operators to do either traffic shaping or performance monitoring since they do not own the infrastructure. This is an increasing issue with the growth of cloud-based services where the infrastructure is normally owned and managed by a third-party with video delivered over some Content Distribution Network (CDN). In that case there is an apparent black hole between the cloud and the end user, making it difficult for a content provider to know what quality the customer is getting.

Another Swedish vendor specialising in distributed video delivery, Edgeware, has tackled this problem with its Convoy VDN, which is software operating within the company’s Distributed Video Delivery Network (D-VDN) platform. Announced at IBC, this operates by combining the receiving device’s capability with the QoS known to be provided by the delivery infrastructure, according to Edgeware’s Chief Marketing Officer Duncan Potter.

The point is that CDNs usually operate via adaptive streaming protocols to improve network efficiency and performance, breaking video up into multiple small file chunks that can take different routes before being reassembled at the destination. The network detects each user’s CPU capacity and bandwidth continuously and adjusts the quality of the stream in real-time to ensure that QoE is always as good as it can be at that point in time. But breaking up video into chunks does make it hard to monitor what is going on within the CDN, and this is the problem Edgeware has addressed with Convoy VDN. “As we are a network device we can see what is going through,” said Potter. “We work out what is sent, collect statistics via a central reporting engine, and that is integrated with the higher level CDN management system.”

Such measures may help ensure optimum quality when a service is working normally but do not cater for major outages within the infrastructure. While IP networks are becoming more reliable, there is rising dependence in an increasingly global content market on external communication links that may be unreliable. This is a particular problem for the growing number of niche and ethnic services that have a global audience distributed across numerous, often small, communities around the world.

Such ethnic services can be lucrative, with high profit margins for operators because consumers are prepared to pay a premium or a separate subscription to receive them, but the total revenue in a given region is usually relatively small. This means operators cannot afford to spend too much capital on protecting against failure of the service in a region beyond their control, according to Danny Wilson, CEO of TV performance monitoring vendor Pixelmetrix. “Typically if an operator imports content from, say, India, they are vulnerable to loss of signal from Delhi,” he points out.

Pixelmetrix is tackling this with software announced at IBC that enables its DVStor recording and playback platform to perform disaster recovery and start playing out the content in the event of an outage. “We are recording what is going on at a downlink coming in from overseas and have integrated this with our test and measurement devices,” says Wilson. “Then if there is any interruption, the sensor detects that input signal is lost, and this DVStor solution can then provide back-up recovery on a real-time basis.”

This, in effect, is a cloud-based disaster recovery service and could be incorporated within IP-based delivery infrastructures. It highlights the growing scope of Quality Assurance, bringing together elements of disaster recovery, troubleshooting and performance monitoring within an overall QoE package.

By Philip Hunter, Videonet

Multi-screen Video Processing

Multi-screen TV is approaching a tipping point now as the Pay TV pioneers look to expand their offers to cover more channels as well as more devices, and more service providers launch TV Everywhere packages. One of the important tasks for many operators walking around IBC this year is to work out how they can scale their multi-screen services beyond a sub-set of the channels they offer on the set-top box. Ultimately consumers will expect all their channels on all screens, of course.

Ericsson Harnesses Hardware and Software to Support More Channels
Ericsson is using IBC to highlight the scalability issue and has two new products that it believes will help operators expand their offers. These are the Ericsson SPR1200 Multiscreen Stream Processor, a true hardware approach to multi-screen compression, and the Ericsson NPR1200 Multiscreen Network Processor, a dense software-based adaptive streaming segmentation and encryption processor, designed to track dynamic updates in adaptive streaming formats and DRM systems associated with the needs of delivery to different types of devices.

The combined solution enables high quality and cost-effective processing of hundreds of channels into thousands of adaptive streaming profiles, the company says. It claims the SPR1200 and NPR1200 represent the most powerful and flexible solution for the growing multi-screen market.

Ericsson’s ConsumerLab research shows that 93% of consumers still watch linear TV and will continue to do so. “The expectation by consumers for multi-screen TV is that all of their content choices available in the home on the large screen will also be available on every screen,” it adds.

RGB has Multi-platform Headend for Large and Mid-sized Deployments
Meanwhile, RGB Networks claims that the combination of its Video Multiprocessing Gateway (VMG) (a carrier-class platform for multi-screen video delivery) and its adaptive streaming solution, the TransAct Packager, provides the most scalable solution available for deployment of advanced IP video services to any device, enabling operators to go straight from trial to deployment.

The company recently added a new member to the VMG product family, in the form of the VMG-8, which it says is ideal for small to medium-sized deployments or deployments at the edge. This product inherits the field-proven transcoding, transrating, ad insertion and other advanced video processing capabilities of the VMG family and packages them in a new 7RU high carrier-grade chassis. The VMG-8 holds up to eight modules and provides a compact alternative to RGB’s larger VMG-14.

In its fully redundant configuration the VMG-8 can be configured with three video transcoder modules, one audio transcoder module and a single controller module for transcoding programmes to over 140 streams for delivery to any IP-enabled device. In this redundant configuration, each module type has a back-up which can take over operation should the primary fail. Complementing its module redundancy, the VMG-8’s reliability is further enhanced with back-up power supplies and cooling fans which automatically take over if a primary unit fails.

Like the company’s VMG-14, the VMG-8 also benefits from recent enhancements to the TCM transcoder module, enabling transcoding of up to 60 SD or HD inputs and 240 adaptive bitrate outputs per VMG-8 chassis. The VMG-14 can now support up to 132 SD or HD inputs and 528 outputs per chassis.

Harmonic Supports Live and File-based Multi-screen Delivery
Harmonic is also focusing on the needs of content distributors and creators as they deliver more of their content to more screens. The company recently announced the ProMedia family of software solutions for optimizing live and file-based multi-screen video production and processing. The ProMedia family performs a broad range of functions, including transcoding, packaging and origination to enable high-quality video creation and delivery of live streaming, live-to-VOD, and VOD services to TVs, PCs, tablets, smartphones and other IP-connected devices. ProMedia is also considered an ideal solution for content creation in file-based workflows such as tapeless production environments.

The ProMedia family provides a suite of software products that can be deployed individually or as an end-to-end video processing solution, offering great flexibility. This solution is also integrated with leading DRM systems, asset management systems and content distribution networks, in addition to other Harmonic products including encoders, receivers, playout servers, and storage.

The ProMedia family leverages Harmonic’s strong H.264 video codec expertise and is based on the same intellectual property behind Harmonic's Electra encoders. The family includes ProMedia Live for real-time video processing and transcoding, featuring enhanced H.264 video codec technology developed by Harmonic and optimized for creating high-quality Internet video streams.

Another important product in the family is ProMedia Package, a carrier-grade adaptive streaming preparation system for secure, high-value Internet video services. ProMedia Package supports numerous HTTP streaming protocol standards and is capable of packaging in multiple output formats from a single video source, enabling a more scalable, distributed architecture.

Envivio Helps Move Content Package and Delivery to the Edge
Envivio has introduced a number of notable new products for multi-screen TV. These are the Halo Network Media Processor (NMP), 4Caster C4 Gen III multi-screen encoder, and the Envivio Genesis universal mezzanine output format.

Halo NMP enables operators to shift their content packaging and delivery processing to the edges of their existing video distribution infrastructure. “Moving these operations makes it possible to add support for delivering high quality, protected video to new devices without altering the headend,” Envivio declares. “Halo NMP complements existing broadcast infrastructure and simplifies distribution to the latest smartphones, tablets, connected TVs and PCs.”

Halo lets operators take advantage of the Genesis universal output format to control the bandwidth demands multi-screen TV makes on backbone networks. Genesis merges the bitrates and resolutions needed to deliver adaptive streams for major standards and technologies into a single, efficient output format. Envivio claims the result is a reduction of as much as 50% in the bandwidth demands multi-screen TV makes on backbone resources.

Video headends powered by the Envivio 4Caster C4 family of encoders provide support for the full spectrum of IPTV, Internet TV, mobile TV, cable, satellite and terrestrial applications. They enable operators to support the growing variety of formats needed to deliver video to any device at any time, including simultaneous video delivery from a single encoder to digital set-top boxes, connected TVs, PCs and Macs, as well as tablets and mobile screens.

Imagine Communications Supports 1,000 Multi-profile Transcoders
Imagine Communications will showcase its ICE Streaming System for streaming live multi-format video to multiple tablets. ICE is a new network-side transcoding platform that allows multi-screen service providers to deliver what it claims is uncompromised video quality across multiple devices with unmatched compression efficiency. The ICE Streaming System supports up to 1,000 stream-aligned, multi-profile transcodes from a single carrier-class blade system platform.

The ICE Streaming System is based on Imagine's widely deployed ICE Video Platform and combines picture quality, scalability and full support for integrated fragmentation, encryption and HTTP streaming.

ISILON and ATEME Partner to Boost Media Processing Performance
On the eve of IBC, ATEME announced that it has partnered with ISILON to support high performance content processing for video delivery to multiple screens. This results from the combination of the ISILON IQ Series NAS storage and ATEME’s TITAN File transcoding platform. The TITAN video processing speed is enhanced by ultra-fast storage. Meanwhile more content titles can be stored thanks to the superior compression efficiency of TITAN.

The companies say the partnership dramatically simplifies the operational challenges of multi-screen transcoding workflows. “Installed in a matter of hours, the solution scales out linearly with the expansion of the content catalogue, the migration to HD, or as new output formats are added to support more viewing devices. It takes only minutes to add transcoding blades or storage capacity: there is no need for re-design and no downtime.”

The combination of ISILON IQ NAS Storage and the ATEME TITAN transcoder is proven and delivers content for more than 40 million pay TV subscribers worldwide already. The partnership, announced in late August, will make it easy for many more service operators to access the solution as they move from tape to file based workflows or enhance their VOD offerings.

By John Moulding, Videonet

Multi-Screen IP Video Delivery

Online and mobile viewing of widely-available, high-quality video content including TV programming, movies, sports events, and news is now poised to go mainstream. Driven by the recent availability of low-cost, high-resolution desktop/laptop/tablet PCs, smart phones, set-top boxes and now Ethernet-enabled TV sets, consumers have rapidly moved through the ‘novelty’ phase of acceptance into expectation that any media should be available essentially on any device over any network connection. Whether regarded as a disruption for cable TV, telco or satellite TV providers, or an opportunity for service providers to extend TV services onto the web for on-demand, time-shifted and place-shifted programming environments – often referred to as ‘three screen delivery’ or ‘TV Anywhere’ – this new video delivery model is here to stay.

While tremendous advancements in core and last mile bandwidth have been achieved in the last decade around the world – primarily driven by web-based data consumption – video traffic represents a quantum leap in bandwidth requirements. Coupled with the fact that the Internet at large is not a managed quality-of-service environment, requires that new methods of video transport be considered to provide the quality of video experience across any device and network that we have come to expect from managed TV-delivery networks.

The evolution of video delivery transport has led to a new set of de facto standard adaptive delivery protocols from Apple, Microsoft, Adobe that are now positioned for broad adoption. Consequently, networks must now be equipped with servers that can take high-quality video content from its source live or file format and ‘package’ it for transport to devices ready to accept these new delivery protocols.

Video Delivery Background

The Era of Stateful Protocols

For many years, stateful protocols including Real Time Streaming Protocol (RTSP), Adobe’s Real Time Messaging Protocol (RTMP), and Real Networks' RTSP over Real Data Transport (RDT) protocol were utilized to stream video content to desktop and mobile clients. Stateful protocols require that from the time a client connects to a streaming server until the time it disconnects, the server tracks client state. If the client needs to perform any video session control commands like start, stop, pause or fast-forward it must do so by communicating state information back to the streaming server.

Once a session between the client and the server has been established, the server sends media as a stream of small packets typically representing a few milliseconds of video. These packets can be transmitted over UDP or TCP. TCP overcomes firewall blocking of UDP packets, but may also incur increased latency as packets are sent, and resent if not acknowledged, until received at the far end.

These protocols served the market well, particularly during the era where desktop and mobile device experiences were limited by frequency, quality, duration, screen/window size/resolution, constrained processor, memory and storage capabilities of mobile devices, etc.

However, the above experience factors have all changed dramatically in the last few years. And that has exposed a number of stateful protocol implementation weaknesses:

Stateful media protocols have difficulty getting through firewalls and routers.
Stateful media protocols require special proxies/caches.
Stateful media protocols cannot react quickly or gracefully to rapidly fluctuating network conditions.
Stateful media client server implementations are vendor-specific, and thus require the purchase of vendor-specific servers and licensing arrangements – which are also more expensive to operate and maintain.

The Era of the Stateless Protocol – HTTP Progressive Download.

A newer type of media delivery is HTTP progressive download. Progressive download (as opposed to ‘traditional’ file download) pulls a file from a web server using HTTP and allows the video file to start playing before the entire file has been downloaded. Most media players including Adobe Flash, Windows Media Player, Apple Quicktime, etc., support progressive download. Further, most video hosting websites use progressive download extensively, if not exclusively.

HTTP progressive download differs from traditional file download in one important respect. Traditional files have audio and video data separated in the file. At the end of the file, a record of the location and structure of the audio and video tracks (track data) is provided. Progressively downloadable files have track data at the beginning of the file and interleave the audio and video data. A player downloading a traditional file must wait until the end of the file is reached in order to understand track data. A player downloading a progressively downloadable file gets track data immediately and can, therefore, play back audio/video as it is received.

Unfortunately, it isn’t possible to efficiently store the audio and video, and create progressive download files from live streams. Audio and video track data needs to be 1) computed after the entire file is created and, then 2) written to the front of the file. Thus, it isn’t possible to deliver a live stream using progressive download, because the track data can never be available until after the entire file has been created.

Even so, HTTP Progressive Download greatly improves upon its stateful protocol predecessors as a result of the following:

No issue getting through firewalls and routers as HTTP traffic is passed through Port 80 unfettered.
Utilizes the same web download infrastructure utilized by CDNs and hosting providers to provide web data content – making it much easier and less expensive to deliver rich media content.
Takes advantage of newer desktop and mobile clients’ formidable processing, memory and storage capabilities to get start video playback quickly, maintain flow, and preserve a highquality experience.

The Modern Era – Adaptive HTTP Streaming

Adaptive HTTP Streaming takes HTTP video delivery several steps further. In this case, the source video, whether a file or a live stream, is encoded into segments – sometimes referred to as "chunks" – using a desired delivery format, which includes a container, video codec, audio codec, encryption protocol, etc. Segments typically represent two to ten seconds of video. Each segment is sliced at video Group of Pictures (GOP) boundaries beginning with a key frame, giving the segment complete independence from previous and successive segments. Encoded segments are subsequently hosted on a regular HTTP web server.

Clients request segments from the web server, downloading them via HTTP. As the segments are downloaded to the client, the client plays back the segments in the order received. Since the segments are sliced along GOP boundaries with no gaps between, video playback is seamless – even though it is actually just a file download via a series of HTTP GET requests.

Adaptive delivery enables a client to ‘adapt’ to fluctuating network conditions by selecting video file segments encoded to different bit rates. As an example, suppose a video file had been encoded to 11 different bit rates from 500 Kbps to 1 Mbps in 50 Kbps increments, i.e., 500 Kbps, 550 Kbps, 600 Kbps, etc. The client then observes the effective bandwidth throughout the playback period by evaluating its buffer fill/depletion rate. If a higher quality stream is available, and network bandwidth appears able to support it, the client will switch to the higher-quality bit rate segment. If a lower quality stream is available, and network bandwidth appears too limited to support the currently used bit rate segment flow, the client will switch to the lower quality bit rate segment flow. The client can choose between segments encoded at different bit rates every few seconds.

This delivery model works for both live- and file-based content. In either case, a manifest file is provided to the client, which defines the parameters of each segment. In the case of an on-demand file request, the manifest is sent at the beginning of the session. In the case of a live feed, updated ‘rolling window’ manifest files are sent as new segments are created.

Since the web server can typically send data as fast as its network connection will allow, the client can evaluate its buffer conditions and make forward-looking decisions on whether future segment requests should be at a higher or lower bit rate to avoid buffer overrun or starvation. Each client will make this decision based on trying to select the highest possible bit rate for maximum quality of playback experience, but not so great that it starves its own buffer of the next needed segments.

A number of advantages accrue with this delivery protocol approach:

Lower infrastructure costs for content providers by eliminating specialty streaming servers in lieu of generic HTTP caches/proxies already in place for HTTP data serving.
Content delivery is dynamically adapted to the weakest link in the end-to end-delivery chain, including highly varying last mile conditions.
Subscribers no longer need to statically select a bit rate on their own, as the client can now perform that function dynamically and automatically.
Subscribers enjoy fast start-up and seek times as playback control functions can be initiated via the lowest bit rate and subsequently ratcheted up to a higher bit rate.
Annoying user experience shortcomings including long initial buffer time, disconnects, and playback start/stop are virtually eliminated.
Client can control bit rate switching – with no intelligence in the server – taking into account CPU load, available bandwidth, resolution, codec, and other local conditions.
Simplified ad insertion accomplished by file substitution.

Encoding/Transcoding

The transcoder (or encoder, if the input is not already compressed) is responsible for ingesting the content, encoding to all necessary outputs, and preparing each output for advertising readiness and delivery to the packager for segmentation. The transcoder must perform the following functions for multi-screen adaptive delivery – and at high concurrency, in real time and with high video quality output.

Video Transcoding

Transcode the output video to a progressive format, which requires the transcoder to support input de-interlacing.
Transcode the input to each required output profile – where a given profile will have its own resolution and bit rate parameters – including scaling to resolutions suitable for each client device. Because the quality of experience of the client depends on having a number of different profiles, it is necessary to encode a significant number of output profiles for each input. Deployments may use anywhere from 4 to 16 output profiles per input. The table below shows a typical use case for the different output profiles:
GOP-align each output profile such that client playback (shifting between different bit rate ‘chunks’ created for each profile) is continuous and smooth.

Audio Transcoding

Transcode audio into AAC – the codec used by adaptive delivery protocols from Apple, Microsoft and Adobe.

Ad Insertion

Add IDR frames at ad insertion points, so that the video is ready for SCTE 35 ad insertion. It is also potentially possible to align chunk boundaries with ad insertion points so that ad insertion can be done via chunk-substitution rather than traditional stream splicing.

Ingest Fault Tolerance

The transcoding system needs to allow two different transcoders that ingest the same input to create identically IDR-aligned output – contributing to strong fault tolerance. This can be used to create a redundant backup of encoded content in such a way that any failure of the primary transcoder is seamlessly backed up by the secondary transcoder.

Packaging

To realize the benefits of HTTP adaptive streaming, a ‘packager’ function – sometimes referred to a ‘segmenter’, ‘fragmenter’ or ‘encapsulator’ – must take each encoded video output from the transcoder and ‘package’ the video for each delivery protocol. To perform this function, the packager must be able to:

Ingest

Ingest live streams or files, depending on whether the work flow is live or on-demand.

Segmentation

Segment chunks according to the proprietary delivery protocols specified by Microsoft Smooth Streaming, Apple HTTP Live Streaming (HLS), and Adobe HTTP Dynamic Streaming.

Encryption

Encrypt segments on a per delivery protocol basis (in a format compatible with each delivery protocol) as they are packaged, enabling content rights to be managed on an individual session basis. For HLS, this is file-based AES-128 encryption. For Smooth Streaming, it is also AES-128, but with PlayReady compatible signaling. Adobe HTTP Dynamic Streaming uses Adobe Flash Access for encryption.
Integrate with third party key management systems to retrieve necessary encryption information.

Note: Third party key management servers manage and distribute the keys to clients. If the client is authorized, it can retrieve decryption keys from a location designated in the manifest file. Alternatively, depending on the protocol used, key location can be specified within each segment. Either way, the client is responsible for retrieving decryption keys, which are normally served after the client request is authenticated. Once the keys are received, the client is able to decrypt the video and display it.

Delivery

The final step in this process is the actual delivery of segments to end clients – the aforementioned desktop/laptop/tablet PCs, smart phones, IP-based set-top boxes and now Internet-enabled television sets. Optimal delivery network design must take into consideration several content type, device type, delivery protocol type and DRM options.

Live vs. File Delivery

In the case of live delivery, it is possible to serve segments directly from the packager when the number of clients is relatively small. However, the typical use case involves feeding the segments to a CDN, either via a reverse proxy ‘pull’ or via a ‘push’ mechanism, such as HTTP POST. The CDN is then responsible for delivering the chunks and playlist files to clients.

The same delivery model can also be utilized in video-on-demand (VOD), but VOD also offers the alternative of delivering directly from the packager, even to a large number of users. However, with VOD delivery, it is sometimes desirable to distribute one file (or a small number of files) that contains all the chunks together; referred to as an aggregate format for the content. Distributing one file allows service providers to easily preposition content in the CDN without having to distribute and manage thousands of individual chunks per piece of content. When a client makes a request, the aggregate file is segmented ‘on the fly’ for that client, using the client’s requested format. The tradeoff is that while the CDN and file management is simpler, more packagers are required – ‘centralized’ packagers that create and aggregate the chunks and ‘distributed, edge-located’ packagers that segment the aggregation format (on demand) into actual chunks delivered to clients.

Output Profile Selection

The optimal number of profiles, bit rates and resolutions to use are very service-specific. However, there are a number of generally applicable guidelines. First, what are the end devices and what is the last-mile network? The end devices drive the output resolutions. It is desirable to have one or two profiles to service the high-quality video service for the device, and these would be encoded at the full resolution of the target device. For PCs, that’s typically 720p30.

Looking at the delivery network, for mobile distribution, it is typical to use very low bandwidth profiles. Even 3G mobile networks, which have relatively high peak bandwidths of several hundred kbps may fall back to much lower sustained bandwidths required for video streaming. WiFi networks have higher capacity, but also suffer from potential degradation depending on the distance to the base station or composition of walls between the transmitter and receiver. DSL distribution to PCs also varies widely in bandwidth capacity. And almost all last-mile networks suffer bandwidth reduction caused by aggregation, for example at a cable node or at the DSLAM. The table below suggests the number of output profiles in different scenarios:

Protocol Selection

Which of Apple HLS, Microsoft Silverlight Smooth Streaming or Adobe HTTP Dynamic Streaming is the optimal choice for a service provider? Each protocol has its own appeal, and so service providers must carefully consider the following in making a delivery protocol selection:

Adobe has a huge installed client base on PCs. For service operators that want to serve PCs and do not want to distribute a client, this is a big benefit. The availability of Adobe’s server infrastructure, including backwards compatibility with RTMP and Adobe Access, may also be appealing to service operators.
Apple HLS uses MPEG-2 transport stream files as chunks. The existing infrastructure for testing and analyzing TS files makes this protocol easy to deploy and debug. It also allows for the type of signaling that TS streams already carry, such as SCTE 35 cues for ad insertion points, multiple audio streams, EBIF data, etc.
Microsoft Smooth Streaming has a very convenient aggregate format and provides an excellent user experience that can adapt to changes in bandwidth rapidly, as it makes use of short chunks and doesn’t require repeated downloads of a playlist. Smooth Streaming is also an obvious choice when content owners require the use of PlayReady DRM.

Redundancy and Failover

Transcoder redundancy is typically managed using an N:M redundancy scheme in which a management system loads one of M standby transcoders with the configuration of a failed transcoder in a pool of N transcoders. The packager component can be managed similarly, but it can also be managed in a 1:1 scheme by having time-outs in the CDN root failover to the secondary packager. Avoiding outages in these scenarios involves making sure that the primary and backup packagers are synchronized, that is, they create identical chunks.

DRM Integration

DRM integration remains challenging. Broadly, there are two approaches:

The first uses unique encryption keys for every client stream. In this case, CDN caching provides no value. Every user view is a unicast connection back to the center of the network; network load is high; but the content is as secure as possible.
The second approach uses shared keys for all content, but keys are only distributed to authenticated clients. CDN caching can then lead to significant bandwidth savings, but key management still requires a unique connection to the core for each client. Fortunately, these connections are far lower bandwidth than the video streams.

Different DRM vendors provide different solutions, and interoperability between vendors for client authentication doesn’t exist:

Adobe uses Adobe Access to restrict access to streams, giving a unified, single vendor work flow.
Apple HLS provides a description of the encryption mechanism, but leaves client authentication as an implementation decision.
Microsoft’s PlayReady is a middle ground. Client authentication is well specified, but the interfaces between the key management server and the packager component is not. This means some integration is typically required to create a fully deployed system.

Apple HTTP Live Streaming (HLS)

HTTP Live Streaming (HLS) allows you to stream live and on-demand video and audio to an iPhone, iPad, or iPod Touch. HLS is similar to Smooth Streaming in Microsoft’s Silverlight platform architecture, and can be thought of as a successor to both RTSP and HTTP Progressive Download (HTTP PD), although both of those video options serve a purpose and likely won’t be going away anytime soon.

HLS was originally unveiled by Apple with the introduction of the iPhone 3.0 in mid-2009. Prior to the iPhone 3, no streaming protocols were supported natively on the iPhone, leaving developers to wonder what Apple had in mind for native streaming support. Apple proposed HLS as a standard to the IETF, and the draft is now in its sixth iteration.

As an adaptive streaming protocol, HLS has several advantages including multiple bit rate encoding for different devices, HTTP delivery, and segmented stream chunks suitable for delivery of live streams over widely available HTTP CDN infrastructure.

How HLS Works

HLS lets you send streaming video and audio to any supported Apple product, including Macs with a Safari browser. HLS works by segmenting video streams into 10-second chunks; the chunks are stored using a standard MPEG-2 Transport Stream file format. Optionally, chunks are created using several bit rates, allowing a client to dynamically switch between different bit rates depending on network conditions.

How does a stream get into HLS format? There are three main steps:

An input stream is encoded/transcoded. The input can be a satellite feed or any other typical input. The video and audio source is encoded (or transcoded) to an MPEG-2 Transport Stream container, with H.264 video and AAC audio, which are the codecs Apple devices currently support.
Output profiles are created. Typically a single input stream will be transcoded to several output resolutions/bit rates, depending on the types of client devices that the stream is destined for. For example, an input stream of H.264/AAC at 7 Mbps could be transcoded to four different profiles with bit rates of 1.5Mbps, 750K, 500K, and 200K. These would be suitable for devices and network conditions ranging from high-end to low-end, such as an iPad, iPhone 4, iPhone 3, and a low bit rate version for bad network conditions.
The streams are segmented. The streams contained within the profiles all need to be segmented and made available for delivery to an origin web server or directly to a client device over HTTP. The software or hardware device that does the segmenting (the segmenter) also creates an index file which is used to keep track of the individual video/audio segments.

Optionally, the segmenter might also encrypt the stream (each individual chunk) and create a key file.

What Does the Client Do?

The client downloads the index file via a URL that identifies the stream. The index file tells the client where to get the stream chunks (each with its own URL). For a given stream, the client then fetches each stream chunk in order. Once the client has enough of the stream downloaded and buffered, it displays it to the user. If encryption is used, the URLs for the decryption keys are also given in the index file.

If multiple profiles (that is, bit rates and resolutions) are available, the index file is different in that it contains a specially tagged list of variant index files for the different stream profiles. In that case, the client downloads the primary index file first, and then downloads the index file for the bit rate it wants to play back. The bit rates and resolutions of the variant streams are specified in the main index file, but precise handling of the variants offered is left up to the client implementation.

Typical playback latency for HLS is around 30 seconds. This is caused by the size of the chunks (10 seconds) and the need for the client to buffer a number of chunks before it starts displaying the video.

An odd curiosity about HLS is that it doesn’t make use of Apple’s Quicktime MOV file format, which is the basis for the ISO MPEG file format. Apple thought that the TS format was more widely used and better understood by the broadcasters who would ultimately use HLS, and so they focused on the MPEG-2 TS format. Ironically, Microsoft’s Smooth Streaming protocol does make use of ISO MPEG files (MPEG-4 Part 12).

For more information on how HLS works, good resources include Apple’s Live Streaming overview documentation.

Microsoft Smooth Streaming (SS)

Smooth Streaming was announced by Microsoft in October 2008 as part of the Silverlight architecture. That year they demonstrated a prototype version of Smooth Streaming by delivering live and on-demand streaming content such as the Beijing Olympics and Democratic National Convention.

Smooth Streaming has all of the typical characteristics of adaptive streaming. The video content is segmented into small chunks, it is delivered over HTTP, and usually multiple bit rates are encoded so that the client can choose the best video bit rate to deliver an optimal viewing experience based on network conditions.

Adaptive streaming is valuable for many reasons including low web-based infrastructure costs, firewall compatibility and bit rate switching. Microsoft is definitely a believer in those benefits as it is making a strong push to adaptive streaming technology with Microsoft Silverlight, Smooth Streaming and Mediaroom.

Smooth Streaming vs. Apple HLS

Microsoft has chosen to implement adaptive streaming in unique ways, however. There are several key differences between Silverlight Smooth Streaming and HLS:

HLS makes use of a regularly updated “moving window” metadata index file that tells the client which chunks are available for download. Smooth Streaming uses time codes in the chunk requests and thus the client doesn’t have to repeatedly download an index file.
Because HLS requires a download of an index file every time a new chunk is available, it is desirable to run HLS with longer duration chunks, thus minimizing the number of index file downloads. Thus, the recommended chunk duration with HLS is 10 seconds, while with Smooth Streaming it is 2 seconds.
The “wire format” of the chunks is different. Both formats use H.264 video encoding and AAC audio encoding, but HLS makes use of MPEG-2 Transport Stream files, while Smooth Streaming makes use of “fragmented” ISO MPEG-4 files. The “fragmented” MP4 file is a variant in which not all the data in a regular MP4 file is included in the file. Each of these formats has some advantages and disadvantages. MPEG-2 TS files have a large installed analysis toolset and have pre-defined signaling mechanisms for things like data signals (e.g. specification of ad insertion points). Fragmented MP4 files are very flexible and can easily accommodate all kinds of data, for example decryption information, that MPEG-2 TS files don’t have defined slots to carry.

For a more in-depth overview of how Microsoft Smooth Streaming works, a good resource is Microsoft’s Smooth Streaming Technical overview whitepaper.

Additionally, for a slightly biased but still accurate representation of some of the key differences between Smooth Streaming, HLS, and Adobe Flash Dynamic Streaming, check out this adaptive streaming comparison matrix.

Adobe HTTP Dynamic Streaming (HDS)

HTTP Dynamic Streaming was announced by Adobe in late 2009 as “project Zeri” and was delivered in June 2010. HDS is more similar to Microsoft Smooth Streaming than it is to Apple HLS. Primarily this is because it uses a single aggregate file from which MPEG-file container fragments are extracted and delivered rather than HLS-like individual chunks, and consequently there are certain implications of that design, which will be discussed in detail.

Characteristics of Adobe HDS

HTTP Dynamic Streaming supports both live and on-demand content using a standard MP4 fragment format (F4F). Video/audio codec support includes VP6/MP3 and H.264/AAC, however as with HLS and SS, the predominant video/audio codecs are H.264/AAC.

Similar to other adaptive streaming protocols, at the start of the stream, the client or CDN/origin server downloads the manifest file (in this case F4M file) which provides all the information needed to play back the content, including fragment format, available bitrates, Flash Access license server location, and metadata information.

Files representing either live or VOD workflows are sent to an HTTP origin server. The origin server is responsible for receiving segment requests from the client over HTTP and returning the appropriate segment from the file. Standard origin servers like Apache can leverage Adobe’s Open Source Media Framework (OSMF) to serve the content.

Differences Between Adobe HDS and Apple HLS

There are several key differences between Adobe HDS and Apple HLS:

HLS makes use of a regularly updated “moving window” metadata index (manifest) file that tells the client which chunks are available for download. Adobe HDS uses sequence numbers in the chunk requests and thus the client doesn’t have to repeatedly download a manifest file.
In addition to the manifest, there is a bootstrap file, which in the live case gives the updated sequence numbers and is equivalent to the repeatedly downloaded HLS playlist.
Because HLS requires a download of a manifest file as often as every time a new chunk is available, it is desirable to run HLS with longer duration chunks, thus minimizing the number of manifest file downloads. More recent Apple client versions appear to now check how many segments are in the playlist and only re-fetch the manifest when the client runs out of segments. Nevertheless, the recommended chunk duration with HLS is 10 seconds, while with Adobe HDS it is usually 2-5 seconds.
The “wire format” of the chunks is different. Both formats use H.264 video encoding and AAC audio encoding, but HLS makes use of MPEG-2 Transport Stream files, while Adobe HDS (and Microsoft SS) make use of “fragmented” ISO MPEG-4 files.

As with HLS, Adobe Flash clients first request a manifest file. The manifest contains information about what streams are available, bit rates, codecs, etc. and the streams are represented by a URL. Using a contiguous file creates two significant changes in the client/server architecture:

The client reads the manifest and can request chunks by a URL with a sequence number rather than a specific chunk name.
The server must calculate exact byte range offsets within the aggregate file by translating URL requests and delivers the appropriate chunk.

For a more in-depth overview of how Adobe HDS works, a good resource is Adobe’s HTTP Dynamic Streaming technical whitepaper.

By Andy Salo, RGB Networks

3D CineCast