MXF and AAF

Many broadcasters know about MXF, and they have heard of things such as MXF for Finished Programs (AS-03) and MXF for Commercials (AS-12). But, this month, I want to focus on MXF’s bigger brother, AAF.

In the middle 1990s, a joint task force was created by the SMPTE and EBU. The purpose was to address the impending flood of digital video proposals. There were a number of different, competing proposals on the table regarding compression, handling of metadata and the exchange of program material as files. Many in the industry were concerned that without a concerted effort, the market would fracture, leaving end users to sort it out. Fortunately, the task force produced a number of recommendations that later led to standards that have helped drive industry consensus about what constitutes interoperable digital video.

The remit of the task force went beyond coming up with recommendations for interoperable digital video formats, however. The final report included in its name “Harmonized Standards for the Exchange of Program Material as Bitstreams.” The group spent a significant amount of time working on something called wrappers. After spirited debate, it was decided that two classes of wrappers should be developed — one for broadcast and one for editing. The group felt that a single wrapper could not accommodate the differing needs of these two application areas. Ultimately, the wrapper for broadcast became MXF, and the wrapper for editing became AAF.

Before we delve fully into AAF, let’s talk about wrappers, what they do and why they are important.

Inside the Wrapper
When using professional digital video, specifically SDI video, the relationship between video and audio is set in the standard. SMPTE 259M, and later SMPTE 292M, for HD, not only specified how video and audio should be streamed, but they also were specific about where additional information such as subtitles and timecode should be carried. Other standards specify exactly how this additional data should be formatted. For manufacturers and users, the world was relatively simple; a stream arrived and was comprised of interleaved pieces of video and audio, along with some “essence data” such as subtitles.

But, in a file-based world, there were many possible ways to exchange the same program material contained in the SDI stream. Should you send a video file, followed by a separate audio file, followed by a data file that told how to play back the two files in sync? Should you send a single file with everything in the same file? Should the video be kept separate from the audio, or should it be mixed together, as in an SDI stream? Where do you put the all-important timecode? And, how do you relate the timecode to the video and audio to which it refers?

Those were just some of the questions that surfaced when we looked at transporting programs as bit streams. The wrapper gave us a way to describe how the video, audio, subtitles, timecode and other “essence” should be packaged together in order to be sent from one place to another. A wrapper can contain video, audio and data essence (subtitles). The concept of a wrapper as a way to organize essence is common in both MXF and AAF. This is a simplistic drawing, but it gives you the idea that the wrapper contains video, audio and data, along with identifiers that are used to keep track of each essence component.


A wrapper can contain video, audio and data essence such as subtitles


The reason I say that this drawing is simplistic is that there are further definitions within MXF itself that constrain the possible arrangements of essence within the file. For example, MXF OP-Atom requires that only a single essence component be included in a file. In other words, an MXF OP-Atom file contains only a single video element or a single audio element. MXF OP-1A allows the combining of video and audio in a single wrapper. But, again, how are the essence types laid out in the file?

Some MXF files contain video as a separate entity from audio. Others contain interleaved video and audio, meaning there is a single essence file that contains a bit of video, followed by a bit of audio channel 1, followed by a bit of audio channel 2, and then back to video again. There is also room in the interleaved file for subtitles.

One last important point: MXF and AAF specifically allow someone who receives the file to understand how video, audio and essence data all relate on a timeline. Of course, this is critical to working with professional video.


The Need for AAF
If MXF contains many different possible layouts, you may be wondering why there is a need for AAF at all. The reason is fairly simple. AAF allows you to wrap up many different tracks of video, audio and data essence, and describe how these different tracks relate to each other. Think about layering or compositing in an edit application. For those not familiar with this idea, layering allows an editor to superimpose one layer of video, say an animated graphic or a window of video, on top of another base layer. Compositing is a common application in editing, but it is not common in the on-air environment, where users generally are playing back finished program material.

An AAF file could contain a base layer consisting of a video and two audio tracks, a compositing layer consisting of another video and two audio tracks, plus some editing metadata that instructs how to manipulate these two pieces of video to produce the finished result.



 
An AAF file could contain a base layer, a compositing layer and some editing metadata


Another difference between AAF and MXF is a result of what I stated above — that MXF is primarily intended to be used in an on-air environment, while AAF is generally intended to be used for editing. Therefore, it is a common user requirement that MXF files be complete and ready to play at any time. Therefore, many (but not all) MXF files contain video and audio inside the file. By contrast, it is not uncommon to find that an AAF file that contains only references to external essence. In other words, the AAF file is small and only contains metadata, including pointers to the actual essence files and other metadata with instructions regarding what to do with them.

So, why is there this fundamental difference between AAF files and MXF files? Well, imagine you are in an air playout situation. The last thing you want to find out just as you are going to air (or after you have pressed the “play” button) is that an audio or video track cannot be located on a remote storage device. Since MXF files need to play when the “play” button is pressed, it makes a lot of sense to package video and audio inside a file.

By contrast, think about a typical broadcast promo. If your station produces local promos, you might be surprised to find out how many separate video, audio, graphics, subtitle and descriptive video elements go into a simple 15-second promo. Now, imagine a half-hour, multi-camera, pre-produced news program. This edit project could have more than 1000 (several thousand, actually) individual elements associated with it. It is likely to be more efficient to organize separately the different elements going into this program on disk, perhaps using shared storage. An AAF file could then be a lightweight, metadata-only file, with pointers to the actual content stored on shared storage. This is a common way to build a professional video editor.

So, AAF and MXF may be used differently, depending on the application. For finished programming, use MXF. For an edit environment, or an environment where you need to describe the relationship between a number of separate elements, use AAF.

Lastly, AAF and MXF are interchange formats and meant to be the lowest-common-denominator to get content from one system to another. It is common or likely that inside a system, you will not find AAF or MXF (with some exceptions). Also, since these are baseline formats, it is common to find capability-adding extensions. But, these extensions could hamper interoperability.

By Brad Gilmer, Broadcast Engineering