Production & Exchange Formats for 3DTV Programmes

The purpose of this EBU recommendation is to give technical aid to broadcasters who intend to use current (or future) 2D HDTV infrastructures to produce 3DTV programmes.

Multi-Screen IP Video Delivery

Online and mobile viewing of widely-available, high-quality video content including TV programming, movies, sports events, and news is now poised to go mainstream. Driven by the recent availability of low-cost, high-resolution desktop/laptop/tablet PCs, smart phones, set-top boxes and now Ethernet-enabled TV sets, consumers have rapidly moved through the ‘novelty’ phase of acceptance into expectation that any media should be available essentially on any device over any network connection. Whether regarded as a disruption for cable TV, telco or satellite TV providers, or an opportunity for service providers to extend TV services onto the web for on-demand, time-shifted and place-shifted programming environments – often referred to as ‘three screen delivery’ or ‘TV Anywhere’ – this new video delivery model is here to stay.

While tremendous advancements in core and last mile bandwidth have been achieved in the last decade around the world – primarily driven by web-based data consumption – video traffic represents a quantum leap in bandwidth requirements. Coupled with the fact that the Internet at large is not a managed quality-of-service environment, requires that new methods of video transport be considered to provide the quality of video experience across any device and network that we have come to expect from managed TV-delivery networks.

The evolution of video delivery transport has led to a new set of de facto standard adaptive delivery protocols from Apple, Microsoft, Adobe that are now positioned for broad adoption. Consequently, networks must now be equipped with servers that can take high-quality video content from its source live or file format and ‘package’ it for transport to devices ready to accept these new delivery protocols.

Video Delivery Background

The Era of Stateful Protocols

For many years, stateful protocols including Real Time Streaming Protocol (RTSP), Adobe’s Real Time Messaging Protocol (RTMP), and Real Networks' RTSP over Real Data Transport (RDT) protocol were utilized to stream video content to desktop and mobile clients. Stateful protocols require that from the time a client connects to a streaming server until the time it disconnects, the server tracks client state. If the client needs to perform any video session control commands like start, stop, pause or fast-forward it must do so by communicating state information back to the streaming server.

Once a session between the client and the server has been established, the server sends media as a stream of small packets typically representing a few milliseconds of video. These packets can be transmitted over UDP or TCP. TCP overcomes firewall blocking of UDP packets, but may also incur increased latency as packets are sent, and resent if not acknowledged, until received at the far end.

These protocols served the market well, particularly during the era where desktop and mobile device experiences were limited by frequency, quality, duration, screen/window size/resolution, constrained processor, memory and storage capabilities of mobile devices, etc.

However, the above experience factors have all changed dramatically in the last few years. And that has exposed a number of stateful protocol implementation weaknesses:

  • Stateful media protocols have difficulty getting through firewalls and routers.

  • Stateful media protocols require special proxies/caches.

  • Stateful media protocols cannot react quickly or gracefully to rapidly fluctuating network conditions.

  • Stateful media client server implementations are vendor-specific, and thus require the purchase of vendor-specific servers and licensing arrangements – which are also more expensive to operate and maintain.

The Era of the Stateless Protocol – HTTP Progressive Download.

A newer type of media delivery is HTTP progressive download. Progressive download (as opposed to ‘traditional’ file download) pulls a file from a web server using HTTP and allows the video file to start playing before the entire file has been downloaded. Most media players including Adobe Flash, Windows Media Player, Apple Quicktime, etc., support progressive download. Further, most video hosting websites use progressive download extensively, if not exclusively.

HTTP progressive download differs from traditional file download in one important respect. Traditional files have audio and video data separated in the file. At the end of the file, a record of the location and structure of the audio and video tracks (track data) is provided. Progressively downloadable files have track data at the beginning of the file and interleave the audio and video data. A player downloading a traditional file must wait until the end of the file is reached in order to understand track data. A player downloading a progressively downloadable file gets track data immediately and can, therefore, play back audio/video as it is received.

Unfortunately, it isn’t possible to efficiently store the audio and video, and create progressive download files from live streams. Audio and video track data needs to be 1) computed after the entire file is created and, then 2) written to the front of the file. Thus, it isn’t possible to deliver a live stream using progressive download, because the track data can never be available until after the entire file has been created.

Even so, HTTP Progressive Download greatly improves upon its stateful protocol predecessors as a result of the following:
  • No issue getting through firewalls and routers as HTTP traffic is passed through Port 80 unfettered.

  • Utilizes the same web download infrastructure utilized by CDNs and hosting providers to provide web data content – making it much easier and less expensive to deliver rich media content.

  • Takes advantage of newer desktop and mobile clients’ formidable processing, memory and storage capabilities to get start video playback quickly, maintain flow, and preserve a highquality experience.

The Modern Era – Adaptive HTTP Streaming

Adaptive HTTP Streaming takes HTTP video delivery several steps further. In this case, the source video, whether a file or a live stream, is encoded into segments – sometimes referred to as "chunks" – using a desired delivery format, which includes a container, video codec, audio codec, encryption protocol, etc. Segments typically represent two to ten seconds of video. Each segment is sliced at video Group of Pictures (GOP) boundaries beginning with a key frame, giving the segment complete independence from previous and successive segments. Encoded segments are subsequently hosted on a regular HTTP web server.

Clients request segments from the web server, downloading them via HTTP. As the segments are downloaded to the client, the client plays back the segments in the order received. Since the segments are sliced along GOP boundaries with no gaps between, video playback is seamless – even though it is actually just a file download via a series of HTTP GET requests.

Adaptive delivery enables a client to ‘adapt’ to fluctuating network conditions by selecting video file segments encoded to different bit rates. As an example, suppose a video file had been encoded to 11 different bit rates from 500 Kbps to 1 Mbps in 50 Kbps increments, i.e., 500 Kbps, 550 Kbps, 600 Kbps, etc. The client then observes the effective bandwidth throughout the playback period by evaluating its buffer fill/depletion rate. If a higher quality stream is available, and network bandwidth appears able to support it, the client will switch to the higher-quality bit rate segment. If a lower quality stream is available, and network bandwidth appears too limited to support the currently used bit rate segment flow, the client will switch to the lower quality bit rate segment flow. The client can choose between segments encoded at different bit rates every few seconds.

This delivery model works for both live- and file-based content. In either case, a manifest file is provided to the client, which defines the parameters of each segment. In the case of an on-demand file request, the manifest is sent at the beginning of the session. In the case of a live feed, updated ‘rolling window’ manifest files are sent as new segments are created.

Since the web server can typically send data as fast as its network connection will allow, the client can evaluate its buffer conditions and make forward-looking decisions on whether future segment requests should be at a higher or lower bit rate to avoid buffer overrun or starvation. Each client will make this decision based on trying to select the highest possible bit rate for maximum quality of playback experience, but not so great that it starves its own buffer of the next needed segments.

A number of advantages accrue with this delivery protocol approach:
  • Lower infrastructure costs for content providers by eliminating specialty streaming servers in lieu of generic HTTP caches/proxies already in place for HTTP data serving.

  • Content delivery is dynamically adapted to the weakest link in the end-to end-delivery chain, including highly varying last mile conditions.

  • Subscribers no longer need to statically select a bit rate on their own, as the client can now perform that function dynamically and automatically.

  • Subscribers enjoy fast start-up and seek times as playback control functions can be initiated via the lowest bit rate and subsequently ratcheted up to a higher bit rate.

  • Annoying user experience shortcomings including long initial buffer time, disconnects, and playback start/stop are virtually eliminated.

  • Client can control bit rate switching – with no intelligence in the server – taking into account CPU load, available bandwidth, resolution, codec, and other local conditions.

  • Simplified ad insertion accomplished by file substitution.

Encoding/Transcoding

The transcoder (or encoder, if the input is not already compressed) is responsible for ingesting the content, encoding to all necessary outputs, and preparing each output for advertising readiness and delivery to the packager for segmentation. The transcoder must perform the following functions for multi-screen adaptive delivery – and at high concurrency, in real time and with high video quality output.

Video Transcoding
  • Transcode the output video to a progressive format, which requires the transcoder to support input de-interlacing.

  • Transcode the input to each required output profile – where a given profile will have its own resolution and bit rate parameters – including scaling to resolutions suitable for each client device. Because the quality of experience of the client depends on having a number of different profiles, it is necessary to encode a significant number of output profiles for each input. Deployments may use anywhere from 4 to 16 output profiles per input. The table below shows a typical use case for the different output profiles:


  • GOP-align each output profile such that client playback (shifting between different bit rate ‘chunks’ created for each profile) is continuous and smooth.

Audio Transcoding

Transcode audio into AAC – the codec used by adaptive delivery protocols from Apple, Microsoft and Adobe.

Ad Insertion

Add IDR frames at ad insertion points, so that the video is ready for SCTE 35 ad insertion. It is also potentially possible to align chunk boundaries with ad insertion points so that ad insertion can be done via chunk-substitution rather than traditional stream splicing.

Ingest Fault Tolerance

The transcoding system needs to allow two different transcoders that ingest the same input to create identically IDR-aligned output – contributing to strong fault tolerance. This can be used to create a redundant backup of encoded content in such a way that any failure of the primary transcoder is seamlessly backed up by the secondary transcoder.

Packaging

To realize the benefits of HTTP adaptive streaming, a ‘packager’ function – sometimes referred to a ‘segmenter’, ‘fragmenter’ or ‘encapsulator’ – must take each encoded video output from the transcoder and ‘package’ the video for each delivery protocol. To perform this function, the packager must be able to:

Ingest

Ingest live streams or files, depending on whether the work flow is live or on-demand.

Segmentation

Segment chunks according to the proprietary delivery protocols specified by Microsoft Smooth Streaming, Apple HTTP Live Streaming (HLS), and Adobe HTTP Dynamic Streaming.

Encryption
  • Encrypt segments on a per delivery protocol basis (in a format compatible with each delivery protocol) as they are packaged, enabling content rights to be managed on an individual session basis. For HLS, this is file-based AES-128 encryption. For Smooth Streaming, it is also AES-128, but with PlayReady compatible signaling. Adobe HTTP Dynamic Streaming uses Adobe Flash Access for encryption.

  • Integrate with third party key management systems to retrieve necessary encryption information.

Note: Third party key management servers manage and distribute the keys to clients. If the client is authorized, it can retrieve decryption keys from a location designated in the manifest file. Alternatively, depending on the protocol used, key location can be specified within each segment. Either way, the client is responsible for retrieving decryption keys, which are normally served after the client request is authenticated. Once the keys are received, the client is able to decrypt the video and display it.

Delivery

The final step in this process is the actual delivery of segments to end clients – the aforementioned desktop/laptop/tablet PCs, smart phones, IP-based set-top boxes and now Internet-enabled television sets. Optimal delivery network design must take into consideration several content type, device type, delivery protocol type and DRM options.

Live vs. File Delivery

In the case of live delivery, it is possible to serve segments directly from the packager when the number of clients is relatively small. However, the typical use case involves feeding the segments to a CDN, either via a reverse proxy ‘pull’ or via a ‘push’ mechanism, such as HTTP POST. The CDN is then responsible for delivering the chunks and playlist files to clients.

The same delivery model can also be utilized in video-on-demand (VOD), but VOD also offers the alternative of delivering directly from the packager, even to a large number of users. However, with VOD delivery, it is sometimes desirable to distribute one file (or a small number of files) that contains all the chunks together; referred to as an aggregate format for the content. Distributing one file allows service providers to easily preposition content in the CDN without having to distribute and manage thousands of individual chunks per piece of content. When a client makes a request, the aggregate file is segmented ‘on the fly’ for that client, using the client’s requested format. The tradeoff is that while the CDN and file management is simpler, more packagers are required – ‘centralized’ packagers that create and aggregate the chunks and ‘distributed, edge-located’ packagers that segment the aggregation format (on demand) into actual chunks delivered to clients.

Output Profile Selection

The optimal number of profiles, bit rates and resolutions to use are very service-specific. However, there are a number of generally applicable guidelines. First, what are the end devices and what is the last-mile network? The end devices drive the output resolutions. It is desirable to have one or two profiles to service the high-quality video service for the device, and these would be encoded at the full resolution of the target device. For PCs, that’s typically 720p30.

Looking at the delivery network, for mobile distribution, it is typical to use very low bandwidth profiles. Even 3G mobile networks, which have relatively high peak bandwidths of several hundred kbps may fall back to much lower sustained bandwidths required for video streaming. WiFi networks have higher capacity, but also suffer from potential degradation depending on the distance to the base station or composition of walls between the transmitter and receiver. DSL distribution to PCs also varies widely in bandwidth capacity. And almost all last-mile networks suffer bandwidth reduction caused by aggregation, for example at a cable node or at the DSLAM. The table below suggests the number of output profiles in different scenarios:



Protocol Selection

Which of Apple HLS, Microsoft Silverlight Smooth Streaming or Adobe HTTP Dynamic Streaming is the optimal choice for a service provider? Each protocol has its own appeal, and so service providers must carefully consider the following in making a delivery protocol selection:
  • Adobe has a huge installed client base on PCs. For service operators that want to serve PCs and do not want to distribute a client, this is a big benefit. The availability of Adobe’s server infrastructure, including backwards compatibility with RTMP and Adobe Access, may also be appealing to service operators.

  • Apple HLS uses MPEG-2 transport stream files as chunks. The existing infrastructure for testing and analyzing TS files makes this protocol easy to deploy and debug. It also allows for the type of signaling that TS streams already carry, such as SCTE 35 cues for ad insertion points, multiple audio streams, EBIF data, etc.

  • Microsoft Smooth Streaming has a very convenient aggregate format and provides an excellent user experience that can adapt to changes in bandwidth rapidly, as it makes use of short chunks and doesn’t require repeated downloads of a playlist. Smooth Streaming is also an obvious choice when content owners require the use of PlayReady DRM.

Redundancy and Failover

Transcoder redundancy is typically managed using an N:M redundancy scheme in which a management system loads one of M standby transcoders with the configuration of a failed transcoder in a pool of N transcoders. The packager component can be managed similarly, but it can also be managed in a 1:1 scheme by having time-outs in the CDN root failover to the secondary packager. Avoiding outages in these scenarios involves making sure that the primary and backup packagers are synchronized, that is, they create identical chunks.

DRM Integration

DRM integration remains challenging. Broadly, there are two approaches:
  • The first uses unique encryption keys for every client stream. In this case, CDN caching provides no value. Every user view is a unicast connection back to the center of the network; network load is high; but the content is as secure as possible.

  • The second approach uses shared keys for all content, but keys are only distributed to authenticated clients. CDN caching can then lead to significant bandwidth savings, but key management still requires a unique connection to the core for each client. Fortunately, these connections are far lower bandwidth than the video streams.

Different DRM vendors provide different solutions, and interoperability between vendors for client authentication doesn’t exist:
  • Adobe uses Adobe Access to restrict access to streams, giving a unified, single vendor work flow.

  • Apple HLS provides a description of the encryption mechanism, but leaves client authentication as an implementation decision.
  • Microsoft’s PlayReady is a middle ground. Client authentication is well specified, but the interfaces between the key management server and the packager component is not. This means some integration is typically required to create a fully deployed system.

Apple HTTP Live Streaming (HLS)

HTTP Live Streaming (HLS) allows you to stream live and on-demand video and audio to an iPhone, iPad, or iPod Touch. HLS is similar to Smooth Streaming in Microsoft’s Silverlight platform architecture, and can be thought of as a successor to both RTSP and HTTP Progressive Download (HTTP PD), although both of those video options serve a purpose and likely won’t be going away anytime soon.

HLS was originally unveiled by Apple with the introduction of the iPhone 3.0 in mid-2009. Prior to the iPhone 3, no streaming protocols were supported natively on the iPhone, leaving developers to wonder what Apple had in mind for native streaming support. Apple proposed HLS as a standard to the IETF, and the draft is now in its sixth iteration.

As an adaptive streaming protocol, HLS has several advantages including multiple bit rate encoding for different devices, HTTP delivery, and segmented stream chunks suitable for delivery of live streams over widely available HTTP CDN infrastructure.

How HLS Works

HLS lets you send streaming video and audio to any supported Apple product, including Macs with a Safari browser. HLS works by segmenting video streams into 10-second chunks; the chunks are stored using a standard MPEG-2 Transport Stream file format. Optionally, chunks are created using several bit rates, allowing a client to dynamically switch between different bit rates depending on network conditions.

How does a stream get into HLS format? There are three main steps:
  • An input stream is encoded/transcoded. The input can be a satellite feed or any other typical input. The video and audio source is encoded (or transcoded) to an MPEG-2 Transport Stream container, with H.264 video and AAC audio, which are the codecs Apple devices currently support.

  • Output profiles are created. Typically a single input stream will be transcoded to several output resolutions/bit rates, depending on the types of client devices that the stream is destined for. For example, an input stream of H.264/AAC at 7 Mbps could be transcoded to four different profiles with bit rates of 1.5Mbps, 750K, 500K, and 200K. These would be suitable for devices and network conditions ranging from high-end to low-end, such as an iPad, iPhone 4, iPhone 3, and a low bit rate version for bad network conditions.

  • The streams are segmented. The streams contained within the profiles all need to be segmented and made available for delivery to an origin web server or directly to a client device over HTTP. The software or hardware device that does the segmenting (the segmenter) also creates an index file which is used to keep track of the individual video/audio segments.

Optionally, the segmenter might also encrypt the stream (each individual chunk) and create a key file.



What Does the Client Do?

The client downloads the index file via a URL that identifies the stream. The index file tells the client where to get the stream chunks (each with its own URL). For a given stream, the client then fetches each stream chunk in order. Once the client has enough of the stream downloaded and buffered, it displays it to the user. If encryption is used, the URLs for the decryption keys are also given in the index file.

If multiple profiles (that is, bit rates and resolutions) are available, the index file is different in that it contains a specially tagged list of variant index files for the different stream profiles. In that case, the client downloads the primary index file first, and then downloads the index file for the bit rate it wants to play back. The bit rates and resolutions of the variant streams are specified in the main index file, but precise handling of the variants offered is left up to the client implementation.

Typical playback latency for HLS is around 30 seconds. This is caused by the size of the chunks (10 seconds) and the need for the client to buffer a number of chunks before it starts displaying the video.

An odd curiosity about HLS is that it doesn’t make use of Apple’s Quicktime MOV file format, which is the basis for the ISO MPEG file format. Apple thought that the TS format was more widely used and better understood by the broadcasters who would ultimately use HLS, and so they focused on the MPEG-2 TS format. Ironically, Microsoft’s Smooth Streaming protocol does make use of ISO MPEG files (MPEG-4 Part 12).

For more information on how HLS works, good resources include Apple’s Live Streaming overview documentation.

Microsoft Smooth Streaming (SS)

Smooth Streaming was announced by Microsoft in October 2008 as part of the Silverlight architecture. That year they demonstrated a prototype version of Smooth Streaming by delivering live and on-demand streaming content such as the Beijing Olympics and Democratic National Convention.

Smooth Streaming has all of the typical characteristics of adaptive streaming. The video content is segmented into small chunks, it is delivered over HTTP, and usually multiple bit rates are encoded so that the client can choose the best video bit rate to deliver an optimal viewing experience based on network conditions.

Adaptive streaming is valuable for many reasons including low web-based infrastructure costs, firewall compatibility and bit rate switching. Microsoft is definitely a believer in those benefits as it is making a strong push to adaptive streaming technology with Microsoft Silverlight, Smooth Streaming and Mediaroom.

Smooth Streaming vs. Apple HLS

Microsoft has chosen to implement adaptive streaming in unique ways, however. There are several key differences between Silverlight Smooth Streaming and HLS:
  • HLS makes use of a regularly updated “moving window” metadata index file that tells the client which chunks are available for download. Smooth Streaming uses time codes in the chunk requests and thus the client doesn’t have to repeatedly download an index file.

  • Because HLS requires a download of an index file every time a new chunk is available, it is desirable to run HLS with longer duration chunks, thus minimizing the number of index file downloads. Thus, the recommended chunk duration with HLS is 10 seconds, while with Smooth Streaming it is 2 seconds.

  • The “wire format” of the chunks is different. Both formats use H.264 video encoding and AAC audio encoding, but HLS makes use of MPEG-2 Transport Stream files, while Smooth Streaming makes use of “fragmented” ISO MPEG-4 files. The “fragmented” MP4 file is a variant in which not all the data in a regular MP4 file is included in the file. Each of these formats has some advantages and disadvantages. MPEG-2 TS files have a large installed analysis toolset and have pre-defined signaling mechanisms for things like data signals (e.g. specification of ad insertion points). Fragmented MP4 files are very flexible and can easily accommodate all kinds of data, for example decryption information, that MPEG-2 TS files don’t have defined slots to carry.

For a more in-depth overview of how Microsoft Smooth Streaming works, a good resource is Microsoft’s Smooth Streaming Technical overview whitepaper.

Additionally, for a slightly biased but still accurate representation of some of the key differences between Smooth Streaming, HLS, and Adobe Flash Dynamic Streaming, check out this adaptive streaming comparison matrix.

Adobe HTTP Dynamic Streaming (HDS)

HTTP Dynamic Streaming was announced by Adobe in late 2009 as “project Zeri” and was delivered in June 2010. HDS is more similar to Microsoft Smooth Streaming than it is to Apple HLS. Primarily this is because it uses a single aggregate file from which MPEG-file container fragments are extracted and delivered rather than HLS-like individual chunks, and consequently there are certain implications of that design, which will be discussed in detail.

Characteristics of Adobe HDS

HTTP Dynamic Streaming supports both live and on-demand content using a standard MP4 fragment format (F4F). Video/audio codec support includes VP6/MP3 and H.264/AAC, however as with HLS and SS, the predominant video/audio codecs are H.264/AAC.

Similar to other adaptive streaming protocols, at the start of the stream, the client or CDN/origin server downloads the manifest file (in this case F4M file) which provides all the information needed to play back the content, including fragment format, available bitrates, Flash Access license server location, and metadata information.

Files representing either live or VOD workflows are sent to an HTTP origin server. The origin server is responsible for receiving segment requests from the client over HTTP and returning the appropriate segment from the file. Standard origin servers like Apache can leverage Adobe’s Open Source Media Framework (OSMF) to serve the content.

Differences Between Adobe HDS and Apple HLS

There are several key differences between Adobe HDS and Apple HLS:
  • HLS makes use of a regularly updated “moving window” metadata index (manifest) file that tells the client which chunks are available for download. Adobe HDS uses sequence numbers in the chunk requests and thus the client doesn’t have to repeatedly download a manifest file.

  • In addition to the manifest, there is a bootstrap file, which in the live case gives the updated sequence numbers and is equivalent to the repeatedly downloaded HLS playlist.

  • Because HLS requires a download of a manifest file as often as every time a new chunk is available, it is desirable to run HLS with longer duration chunks, thus minimizing the number of manifest file downloads. More recent Apple client versions appear to now check how many segments are in the playlist and only re-fetch the manifest when the client runs out of segments. Nevertheless, the recommended chunk duration with HLS is 10 seconds, while with Adobe HDS it is usually 2-5 seconds.

  • The “wire format” of the chunks is different. Both formats use H.264 video encoding and AAC audio encoding, but HLS makes use of MPEG-2 Transport Stream files, while Adobe HDS (and Microsoft SS) make use of “fragmented” ISO MPEG-4 files.

As with HLS, Adobe Flash clients first request a manifest file. The manifest contains information about what streams are available, bit rates, codecs, etc. and the streams are represented by a URL. Using a contiguous file creates two significant changes in the client/server architecture:
  • The client reads the manifest and can request chunks by a URL with a sequence number rather than a specific chunk name.

  • The server must calculate exact byte range offsets within the aggregate file by translating URL requests and delivers the appropriate chunk.

For a more in-depth overview of how Adobe HDS works, a good resource is Adobe’s HTTP Dynamic Streaming technical whitepaper.

By Andy Salo, RGB Networks

3ality Digital Acquires Element Technica

3ality Digital, considered to be the leading innovator of the most sophisticated S3D production technology in the industry, announced it has acquired Element Technica, a company long-known for its manufacturing expertise, accessories, and mechanical engineering of motorized S3D camera rigs.

3ality Digital is now 3ality Technica, and with its acquisition of Element Technica, 3ality Technica now provides all of the control, accuracy, breadth, automation, modularity, accessories, and design of both existing product lines. Physically, the companies will combine in an expansion to the 3ality Digital headquarters in Burbank, CA.

Source: 3ality Digital

TvTak Debuts App for “TV Show Detection”

TvTak is the fastest way to tell your friends what you’re watching on Facebook, Twitter or Email. All you need is an iPhone and regular TV. The TvTak advanced recognition technology does the rest.

When you’re watching TV and see something interesting, just point your iPhone to the screen. TvTak instantly detects what show or commercial you’re watching. It only takes one second. No need for typing the name of the show, or for cumbersome check-ins among long lists of programs.


Click to watch the video


TvTak has developed innovative patent-pending algorithms for instant recognition of TV shows and commercials while they are being broadcast. TvTak technology runs on smartphones and tablets, using the camera, the processor, the network, and the localization services.

TvTak recognition algorithms are real-time; there is no need to process any video in advance. TvTak allow for instant video recognition of TV Shows as they are being broadcast. It works on any television set, and there is no need to have a special set-top box or to buy a new TV with WIFI connection.

In addition to TvTak Free App for consumers, TvTak offers a Software Development Kit (SDK) for the iPhone, and white-label service platform for telecom operators, marketers and interactive agencies to include instant video detection features within their own apps.

TvTak enhances the Television Experience by offering viewers apps displaying contextual elements in perfect synch with the television broadcast, opening opportunities for new cross-media branding and revenue.

Walt Disney Exec Predicts End of 3D Rigs

Walt Disney Studios' vice president of production technology believes that 3D rigs have a limited lifespan and that a more post-production oriented approach to filming stereo feature film is required.

Disney’s Howard Lukk argues for a hybrid approach to stereoscopic filmmaking which would supplement a 2D camera with smaller ‘witness’ cameras to pick up the 3D volumes, then apply algorithms at a visual effects house or a conversion company to create the 3D and free the filmmaker from cumbersome on-set equipment.

For Lukk, the problem is that 3D rigs are complicated to build and harder still to calibrate for true accuracy.

“There are enough things for the DOP, director and camera operators to try to track on the set as it is, without having to track interaxial and convergence,” he said. “We are making it more complicated on the set, where I think it needs to be less complicated.”

There should not be a battle between natively captured 3D and conversion, he contends. “We should start to ask if there is another method – even combine the two? We really should be looking at all methods to make better quality 3D. In the end, if 3D is not good quality, we are going to kill this stereoscopic industry just as it is re-emerging.”

If 3D camera rigs are not the future of the industry, Lukk suggests that a hybrid approach will develop which will be a combination of capturing volumetric space on set and being able to produce the 3D in a post-production environment at the back end.

“This will give you much more versatility in manipulating the images. This idea feeds on the idea of computational cinematography conceived by Marc Levoy (a computer graphics research at Stanford University) a few years ago. Basically this says that if we capture things in a certain way, we can compute things that we really need in the back end.

“You can be less accurate on the front end. Adobe has been doing a lot of work in this area, where you can refocus the image after the event. You can apply this concept to high dynamic range and higher frame rates.”

Disney is currently researching this method at Disney Research in Zurich, Lukk added. In addition Lukk says that research is also being conducted at the Fraunhofer Institute in Germany.

“I think eventually we’ll get back to capturing the volumetric space and allowing cinematographers and directors to do what they do best - that is, capturing the performance,” he said.

Source: TVB Europe

New Approach to 3D Shown at SIGGRAPH

Light field displays can provide glasses-free 3D images, under the right circumstances. One of their big advantages is they are independent of "sweet spots" and the 3D image can be seen from any location in the viewing area. With proper design, they can show both horizontal and vertical parallax and they have "look around" capability, allowing you to see what is behind a foreground object by moving sideways in the viewing region.

Unfortunately, they are not problem free. One solution involves multiple projectors, up to 40 or more, in a rear projection configuration, and special processing software/hardware to drive the projectors. That is not a hand-held system by any means. Another issue has been their modest image quality, good enough for some digital signage applications, perhaps, but not good enough for TV.

In a paper at SIGGRAPH 2011, which is continuing through tomorrow in Vancouver, British Columbia, Gordon Wetzstein from the University of British Columbia and 3 co-authors from UBC and the MIT Media Lab, presented a paper that may revolutionize 3D displays and bring light field technology into hand-held systems or flat panel TV. Someday-don’t hold your breath, though.

According to the authors, the approach is a "multi-layer generalization of conventional parallax barriers." Instead of being two-layer, the LCD and the parallax barrier, their demonstration system is 5 layers plus a backlight. These layers, instead of being black and clear like the parallax barrier, are "attenuation layers" that selectively reduce the intensity of the light and produce gray scale from full black to fully transparent.

Multi-layer 3D displays in the past have normally produced the depth directly, with each layer in display producing a different depth plane. The depth volume of the system is then limited to the thickness of the display, which would be a severe limitation for a handheld display or a flat-panel 3D TV. Such an approach is more generally considered a volumetric display.

The UBC/MIT approach doesn’t work this way. Instead, the different attenuation layers interact with each other to control the intensity and color of the light in each different direction. This reproduces the light field produced by light reflecting off of the original object, hence the name light field reconstruction. This approach, in theory at least, can produce both out-of-screen and behind-the-screen 3D effects.


Click to watch the video

What’s the catch? First, SIGGRAPH isn’t a display conference. The conference is not particularly concerned about the hardware implementation of a display but about the algorithms needed to drive the display. This shows up, for example, in the name of this paper, "Layered 3D: Tomographic Image Synthesis for Attenuation-based Light Field and High Dynamic Range Displays." The demonstration showed at SIGGRAPH involves not electronic displays but film transparencies with test images. One of the goals of the paper was to determine how many attenuation layers are needed. The short answer is 3 aren’t enough and 5 do a pretty good job. If you want a lot of out-of-screen effects, you may want as many as 8 attenuation layers.

Eventually, a practical display would need to replace these transparencies with LCDs or some other type of "attenuation layer" with a high enough aperture ratio so 5 (or more) of them in succession would: a) let enough light through to be useful; and b) not produce severe moiré effects. There is hope, though, since the work was supported by both Dolby and Samsung Electronics.

For more details, see the upcoming edition of Mobile Display Report, or either the UBC or MIT Media Lab project website.

By Matt Brennesholtz, Display Daily

Skype Goes VP8, Embraces Open Video Codec

Skype has adopted Google’s open source video codec VP8 as its default solution for video conferencing, according to a blog post from Google Product Manager John Luther. The new Skype for Windows client 5.5 will automatically use VP8 both for one-on-one and group video calls as long as other participants are using the same version.

Skype has been using VP8 for group video calls since late last year, but the adoption of the codec for one-on-one calls as well is definitely a boost for Google’s open video ambitions. Google open sourced VP8 in May of 2010 as part of its WebM video format, but many end users likely haven’t seen VP8 in action just yet. WebM is supported by Chrome, Opera and Firefox, and YouTube has been converting its entire catalog to the format. However, the site still serves up H.264-encoded videos in Flash by default, and users have to opt in to a special trial to get to see the WebM versions of YouTube videos.

The codec has also been targeted by patent pool entity MPEG LA, which is threatening to form a patent pool for VP8. Google has maintained that companies adopting WebM or VP8 have nothing to fear, and the fact that a company that’s being acquired by Microsoft is willing to put its eggs in the open codec basket definitely should quell some fears and possibly encourage other video sites as well as video conferencing providers to switch to embrace the format. One should note, however, that Microsoft has so far shied away from adopting WebM for its Internet Explorer browser.

WebM developers have long been saying that it is well-suited for real-time applications, and Google itself is working on making VP8 the default video codec for both Google Talk and its new group video chat platform Google+ Hangouts.

By Janko Roettgers, GigaOM

How Anthony Rose Plans to Revolutionise TV

I have seen the future of TV and it is called Zeebox. The next project from Anthony Rose, the technologist who built KaZaA and BBC iPlayer in to some of the most disruptive digital media plays, is due to go live in October. Topped by Peoplesound founder and ex EMI SVP Ernesto Schmitt as CEO, the pair’s startup raised $5 million in seed funds from unidentified investors in June and has been operating in stealth as “tBone”. But it has been renamed and has just located at offices at London’s Covent Garden, where it has a staff of almost 30 (including former iPlayer engineers) and where the pair showed me an exclusive demo…

What is Zeebox?
Attempting to ride both the multi-screen TV engagement trend and the increasing adoption of internet TVs, Zeebox is a real-time system for social TV viewing and for engaging deeply around those shows that depends on recognising sofa-based second screens as the place for innovation.

The free Zeebox app for iPad (and, later, iPhone and Android) is a TV guide that displays what shows Facebook and Twitter friends are watching. Owners of compatible connected TVs can flip channel straight from the app, as though it were a remote control. Although the command takes place over the internet, the change happens as quick as or quicker than some standard infrared remotes.

Viewing Together
Notifications appear on-screen to indicate friends’ presence in channels. Users can chat in the iPad app and send invites to join one another for simultaneous viewing - accepting an invite results in the channel changing. “Jack, come and watch The Apprentice with me,” Schmitt tells me, by means of example.

As well as these personal connections, Zeebox users’ collective actions can shape the experience. The app displays real-time data for which shows are “trending” up or down. In a scenario Rose presents, a notification appears to say Top Gear is currently “hot” (perhaps Jeremy Clarkson has said something particularly egregious). The opportunity to surface breaking news in this serendipitous way is clear, along with the prospect of improving TV ratings measurement with actual real-time data.

Making TV Hyperlinked
But this “second-by-second” approach is the fabric of more than just Zeebox’s social interaction. Using both commercially licensable broadcast metadata and frame-by-frame analysis of live TV pictures and audio, Zeebox will apparently understand exactly what is on the TV screen at any given moment (“just as Google spiders the web”), in order to serve up all manner of related material on the handheld app.

As example, Schmitt shows how, whilst Tom Cruise is interviewed on Top Gear, the app will auto-display “infotags” for spoken topics (say, “Ferrari 458”, “Abu Dhabi”, “Sebastian Vettel” and “Tom Cruise” himself), as Cruise is speaking. Each topic becomes an in-app link to a corresponding piece of online content, on Wikipedia, IMDB, iTunes Store or whatever.

The method involved is Zeebox’s “secret sauce”, the subject of a pending patent application, but it’s called automated content recognition (ACR), a field with several vendors including Civolution.

“As context emerges on TV, these infotags just keep ticking up,” Schmitt says. “I find this so unbelievably exciting. Anything being discussed on TV is right there for you.” Or, as Rose puts it: “It’s like crack - you just keep wanting stuff, and getting it second-by-second. TV just becomes better.”

One of the intended uses of “infotags” is commerce. Schmitt wants viewers to be able to buy things relating to what they see on screen. As I flip channel to QVC, he assures me Zeebox will know what’s on-screen is a cubic zirconia ring - and offer me more information, as well as ways to buy that ring.

Programme Context
Rose wants in-app TV show pages to display live tweet streams as well as broadcaster-owned HTML “widgets” for custom show engagement. “BBC Red Button’s non-interactive, a bugger to author for and a bugger to use,” he says. “Imagine a next-generation Red Button toolset that allows people to author things for an IP age.”

From these show pages, Zeebox will also offer links to available on-demand episodes. They could be played on the tablet or smartphone, but Rose tells me users may eventually be able to use those handhelds to invoke playback on the TV. “I don’t have a full answer to these things yet, but we’ll experiment with the full infrastructure,” he says.

How it Works
At its most basic, an iPad user can “check-in” to shows manually (though Rose and Schmitt hate the GetGlue- and Foursquare-style gamification concept). To automate that process, the app can listen for shows’ audio fingerprints, Shazam-style. Connected TV owners get the full automatic experience because those TVs already know what shows are on.

“The browser in connected TVs lets you create HTML overlays and widgets,” Rose says. “We’ve created a lightweight, Javascript-based plugin that, on many 2011 and 2010 TVs, can be software-updated and user-installed.”

Zeebox is currently in demo on Samsung Smart TV and Rose, the former CTO of the YouView connected TV consortium, says: “YouView’s got a nice underlying architecture that will allow the Zeebox plugin to run on it, so we look forward to those discussions in the fullness of time.”

A Zeebox open API is also proposed to empower developers to build similar functionality in to their own apps. “There’s a shitload of technical work that needs to be done,” Rose says. “Getting there is non-trivial. We want to go to the moon.”

By By Robert Andrews, PaidContent

Rising to the OTT Content Protection Challenge

If connected TV services are to carry premium content that people are willing to pay for, security must be highly renewable with the whole ecosystem under constant surveillance for threats. It must be possible to authenticate Consumer Electronic (CE) devices approved for access to premium content and cut them off instantly in the event of a breach, with the ability to swap out to a new security system just as happens with smartcard based CA (Conditional Access). This is the view of traditional Conditional Access (CA) vendors that hope to be custodians of security in the OTT world as well.

In this connected TV world, operators will no longer have direct control over end devices but must still maintain a one-to-one relationship with all devices allowed to access premium content, according to Fred Ellis, Director of Operations and General Manager at SecureMedia, part of the Motorola Mobility Group. Ellis recommends two types of embedded security to achieve this, either using digital certificates signed and inserted in the device by the manufacturer, or a secure software client that exploits some unique feature of the box, such as its native HLS (HTTP Live Streaming ) player. Both of these will allow the operator to authenticate the device each time a session is opened, for example by verifying the digital certificate.

Such client security methods must blend with OTT delivery methods such as Adaptive Bit Rate Streaming (ABRS), but above all they must support continuous monitoring. This is even more essential for OTT than in a traditional walled garden Pay TV broadcast infrastructure, according to Christopher Schouten, Senior Marketing Director for online services at CA vendor Irdeto, whose ActiveCloak platform is used by US OTT provider Netflix. Successful defence against OTT piracy involves a combination of continuous surveillance with dynamic security that can apply changes very quickly, Schouten argues. “The key to proactive hacking prevention is the ability to monitor hacker chatter on the web to analyse patterns and isolate instances of real threats that require expert attention,” he argues.

SecureMedia is equally convinced that sophisticated monitoring will be an essential component for OTT success. “Our SecureMedia’s iDetect tamper detection technology immediately notifies operators of any attempted breach in the delivery system and they can promptly terminate content delivery to the device,” says Ellis.

There is also the higher level issue of rights enforcement, and here too OTT extends the challenge because the CE devices may not all support the same DRMs. So operators cannot apply a single security system end-to-end as they can within their walled gardens, according to Tore Gimse, CTO at Norwegian CA vendor Conax. “There is no definite reason why there should be the same security end-to-end," says Tore. “We already offer bridges between the Conax CA and other DRMs, securely handing the protected content from one system over to another. Such solutions could, in principle, be extended to other forms of security transfer.”

In a sense, OTT creates two conflicting security demands. On the one hand, the infrastructure must be even more closely monitored and managed to combat the new piracy threats that will arise in a wide open infrastructure. But on the other hand, no single CA vendor can rule, requiring interoperability between multiple systems, and this can create points of weakness. One implication is that CE vendors may mount a challenge to Pay TV operators by arguing that the network in the middle cannot be secured anyway so you might as well put all the effort into the device. Then content owners would bypass the operators and go straight to consumers.

By Philip Hunter, Videonet