A nice video introduction to AS10 by Bruce Devlin.
As broadcast facilities and post-production houses implement new workflow and media management systems, they deal with an array of new IT-based technologies and are inundated with acronyms associated with those technologies, as well as the best practices concerning their use. Adoption of IT-based technologies continues apace with continued convergence across the media industry.
Despite this rapid shift of operations into the IT realm, however, the industry as a whole has not been as fast in providing engineering staff with an education on the meaning of terms such as XML, WSDL, SOAP, SOA and REST, the technologies underpinning them, and the role they play in present and future media-focused operations. This article will review the most relevant acronyms, their fundamental principles and their typical applications.
XML and XML Schema
Complex control systems depend on a reliable interchange of information in order to make the myriad business decisions that guide material through workflow. This is made simpler if individual systems can swap information in the form of structured data — that is, documents that contain both content and an indication of the content's role in the document.
The concept of structured data is not new; humans have been dealing with information formatted this way for centuries. Examples include published (human-readable) books and magazines, as well as web pages displayed by a browser. In both cases, these systems indicate to the consumer the relevance and importance of any particular item on the page through the use of typographical hints — such as underlining text that hyperlinks to other documents. HTML achieves this by including “tags” in the document, and the browser uses those tags to determine how to emphasize the relative value of the associated data.
As a markup language, HTML represents a fairly limited set of standardized tags intended for the presentation of web content, so its utility beyond that scope likewise is limited.
XML, however, offers flexibility that makes it more suitable for commercial use. This is true because, unlike HTML, XML allows users to define their own tags. Rather than defining the tags themselves (or the semantics of the tags), XML provides a facility for defining tags and using them within a document. As with HTML, data is then bracketed within opening and closing tags, which the receiving device can read and act on as appropriate.
Here is an example of the use of a tag “author” to identify a book's author in a library document:
Because XML tags are user-definable, an XML Schema is used to define which tags are valid in a particular document. Such definitions describe, for example, elements that can appear in a document (along with their attributes), whether an element is empty or can include text, and the data types for elements and attributes.
Thus, a document can be compared to an XML Schema — acting much like a blueprint, style guide or template — to ensure the document is valid and contains all relevant information. Again, this idea is not a new concept solely used by XML. Most databases have some sort of schema. Also, textbooks have used schemas for years, generally referring to them as “style guides.”
Here is an example of an XML schema for a memo:
SOAP and SOA
Simple Object Access Protocol (SOAP) is a lightweight protocol for the exchange of information. It describes three main elements: an envelope, the encoding rules, and a convention for the representation of remote procedure calls and responses. SOAP is no longer used; however, the technology remains very much in use.
Though it isn't used, the SOAP acronym is sometimes confused with Service-Oriented Architecture (SOA). Though SOAP standards may be part of an SOA application, the acronyms are not related.
SOA is a design philosophy that separates the core functions of a business into independent modules. These modules are referred to as services — essentially software functions — that are called by one or more other services in the overall workflow. (Media transcoding is an example of a service that could be invoked as part of an overall workflow management system.)
Operating on the principle of loosely coupled systems, SOA abstracts the functions of a service from the low-level substructures of that service, leaving the calling service to use much higher-level language to drive the process. In doing so, SOA can make it easier to choreograph multiple business activities and processes among multiple complex software systems, all working under the control of a central process to achieve the required goal.
Consider the operation of a typical media enterprise: Material comes into a facility on tape, via satellite or as a file transfer, and content from each of these sources must be processed before it can be used. Tapes must be ingested, satellite feeds must be fed through some sort of IRD, and files must be received and error-corrected. Some form of transcoding may then be applied in order to provide the material in the house format. A QC stage may then be applied to ensure technical compliance. In the software world, each of these would be software modules in a SOA-based system.
Without SOA, setting up such systems is a time-consuming process that likely will demand customization so that one vendor's automation/asset management system can talk to another vendor's software module or hardware processor. Any operational or technical changes may require replacement of a module or augmentation with another vendor's module.
For example, in a traditional workflow, the automation system must have deep knowledge of the proprietary API calls required to tell a transcoder to transcode a particular piece of material to another format. Every time a new format is added to the transcoder by its manufacturer, that interface must be modified (or in the worst case, completely rewritten). The custom integration software capable of addressing systems from different vendors thus presents a high-potential investment in time and money.
In a more agile SOA-based architecture, the automation system would simply send an XML message that says, “Transcode file X to format Y, and store the result as Z.” This message remains consistent, even if a new transcoder or format is added. Regardless of the vendor, equally equipped systems will perform the same basic transcode when given the same command. Knowing this, engineers can take a more flexible approach to workflow design. This model depends on the fact that each service describes itself and its capabilities to the rest of the system. Web Services Descriptive Language (WSDL) facilitates this exchange of information.
WSDL is an XML-based language used for describing features and capabilities of a particular service. Provided to the central controlling system, this information ensures all other services and applications are aware of a particular service's capabilities.
Four critical pieces are supplied:
- Interface information describing all publicly available functions;
- Data type information for all message requests and message responses;
- Binding information about the transport protocol to be used in calling the service;
- Address information for locating the specified service.
Representational State Transfer (REST), an architecture for distributed systems such as the World Wide Web, is targeted at HTML rather than XML. It addresses the scalability of component interaction, generality of interfaces and deployment of components to reduce latency and enforce security of transactions. Unlike SOA, which often is considered peer-to-peer architecture, REST is oriented toward client/server interaction where clients initiate requests to servers, and servers process requests and return appropriate responses.
In a RESTful transaction, requests and responses are built around the transfer of representations of resources, which themselves are specific information sources. A representation of a resource is typically a document that captures a resource's current or intended state. Each resource is referenced with a global identifier. To manipulate these resources, components of the network — typically user agents and origin servers — communicate via a standardized interface like HTTP and exchange representations.
In the case of web page information, the raw data on which the page is built is the resource, but the representation of that data could be different for different users. What is representation of a resource? Consider a human using a browser to view a web page in order to retrieve information, and a computer doing the same as part of some larger activity. Both navigate to the same page, but the computer has no interest in the page's layout and styling; it just wants the data. The human does care about layout and styling (which, after all, exists to make the page easier for humans to digest). The two clients (human and computer) want different representations of the resource, both derived from the same raw data source.
By Paul Turner, Broadcast Engineering
Today's broadcasters are looking for the highest image quality, flexible delivery formats, interoperability and standardized profiles for interactive video transport and workflows. They also have a vested interest in a common high-end format to archive, preserve and monetize the avalanche of video footage generated globally.
This is the story behind the rapid adoption of JPEG 2000 compression in the contribution chain. Standardized broadcast profiles were adopted in 2010 to match current industry needs (JPEG 2000 Part 1 Amendment 3 — Profiles for Broadcast Application — ISO/IEC 15444-1:2004/Amd3), ensuring this wavelet-based codec's benchmark position in contribution.
In parallel, these broadcast profiles have also filled the industrywide need for compression standards to archive and create mezzanine formats, allowing transcoding to a variety of media distribution channels. The ongoing standardization process of the Interoperable Master Format (IMF) by SMPTE based on JPEG 2000 profiles brings the adoption full-circle.
The U.S. Library of Congress, the French Institut National de l'Audiovisuel (INA) and several Hollywood studios have selected the codec for the long-term preservation of a century of audio-visual contents.
JPEG 2000 is different from other video codecs. MPEG and other DCT-based codecs have been designed to optimize the compression efficiency to deliver video to viewers via a pipe with limited bandwidth. JPEG 2000, with its wavelet transform algorithm, brings features not only for image compression efficiency, but to give also the user better control and flexibility throughout the image processing chain. The codec provides unique features that are not available in any other compression method.
JPEG 2000 under the Spotlight
JPEG 2000 is based on the discrete wavelet transform (DWT) and uses scalar quantization, context modeling, arithmetic coding and post-compression rate allocation. JPEG 2000 provides random access (i.e. involving a minimal decoding) to the block level in each sub-band, thus making it possible to decode a region, a low-resolution or a low-quality image version of the image without having to decode it as a whole.
JPEG 2000 is a true improvement in functionality, providing lossy and lossless compression, progressive and parseable code streams, error resilience, region of interest (ROI), random access and other features in one integrated algorithm.
In video applications, JPEG 2000 is used as an intraframe codec, so it closely matches the production workflow in which each frame of a video is treated as a single unit. In Hollywood, its ability to compress frame by frame has made this technology popular for digital intermediate coding. If the purpose of compression is the distribution of essence and no further editing is expected, long-GOP MPEG is typically preferred.
JPEG 2000 brings valuable features to the broadcast process, including ingest, transcoding, captioning, quality control or audio track management. Its inherent properties fully qualify a codec for creating high-quality intermediate masters.
Post-production workflows consist of several encoding/decoding cycles. JPEG 2000 preserves the highest quality throughout this process, and no blocking artifacts are created. Moreover, the technology supports all common bit depths whether 8, 10, 12 bits or higher.
JPEG 2000 enables images to be compressed in lossy and visually or mathematically lossless modes for various applications. Additionally, its scalability allows a “create once, use many times” approach to service a wide range of user platforms.
The technology also enables improved editing: Even at the highest bit rates, its intrinsic flexibility makes it user-friendly on laptop and workstation editing systems, with a limited number of full bit rate real-time video tracks. Improving computing hardware is certain to increase the number of real-time layers.
Since JPEG 2000 is an intraframe codec, this prevents error propagation over multiple frames and allows the video signal to be cut at any point for editing or other purposes.
Easy transcoding appeals to high-end applications where workflows vastly benefit from transcoding to an intermediate version. JPEG 2000 ensures a clean and quick operation when bit rate is at a premium. Professional viewers have labeled correctly transcoded 1080p JPEG 2000 files compressed at 100Mb/s as “visually identical” to the original 2K footage. Furthermore, the wavelet-based JPEG 2000 compression does not interfere with the final — usually DCT-based — broadcast formats.
Last but not least, several standards specify in detail how the JPEG 2000 video stream should be encapsulated in a number of widely adopted containers such as MXF or MPEG-2 TS.
Professional Wireless Video Transmission
Wireless transmission is often challenged to improve its robustness in broadcast. Uncompressed HD wireless transmission is often seen as complex, for even if a 1080p60 transmission (3Gb/s) were possible wirelessly, it would be quite difficult to add the necessary FEC and encryption to the data stream. Of all the compression algorithms available in the market, JPEG2000 is seen as one of the top contenders for the following reasons.
JPEG 2000 is inherently more error resilient than MPEG codecs. The codestream can be configured so the most important data (the lowest frequency data contains the most visually significant information) is located in the front, while successively higher frequency, less important data can be placed in the back. Using appropriate FEC techniques, the lower frequency data can be protected while less protection can be applied to the higher frequency data, as errors in the higher frequency bands have much less effect on the displayed image quality.
Also, similar to the contribution, the low latency of JPEG 2000 is something that would be practically impossible for wireless systems using an MPEG system based on long GOP coding.
The broadcasters and video archivists are looking for long-term digital preservation on disk. In most cases, the source material is not digital, but film that needs to be scanned or high-quality analog videotape. As such, a destination digital format must then be selected.
Key requirements often include reducing the storage costs of uncompressed video while still maintaining indefinite protection from loss or damage. Moreover, the format should preferably enable digitized content to be exploited, which means providing flexibility — workflows again — and security. For these reasons, several studies and user reports claim JPEG 2000 to be the codec for audio-visual archiving.
Several reasons make JPEG 2000 a codec of choice for audio-visual archiving:
- The JPEG 2000 standard can be used with two different wavelet filters: the 9/7 wavelet filter that is irreversible and the 5/3 wavelet filter that is fully reversible. The 5/3 wavelet filter offers a pure mathematically lossless compression that enables a reduction in storage requirement of 50 percent on average while still allowing the exact original image information to be recovered. The 9/7 wavelet filter can encode in lossy or visually lossless modes.
- The scalability allowing proxy extraction, multiple quality layers, is of huge interest to ease client browsing and retrieval or transcoding and streaming.
- JPEG 2000 is an open standard that supports every resolution, color depth, number of components and frame rate.
- JPEG 2000 is license- and royalty-free.
Several initiatives are pushing the industry beyond today's HD: NHK Super Hi-Vision, also known as 8K and UHDTV, the Higher Frame Rates in Cinema initiative by James Cameron and Peter Jackson (up to 120fps), 16-bit color depth, and the numerous manufacturers that are now offering 4K technology.
The need for efficient codecs has gained significant attraction amongst the industry. The future of JPEG 2000 is bright as it is an open standard that requires less power, consumes less space in hardware implementations and generally delivers greater scalability, flexibility and visual quality than other codecs. An increasing number of manufacturers, broadcasters and producers are using JPEG 2000 implementations to adapt today's industry to these new challenges.
By Jean-Baptiste Lorent and François Macé, Broadcast Engineering
Most video professionals are now quite familiar with metadata. But, just to be sure everyone is on the same page, when we use the term metadata in this tutorial, we are referring to information (data) that typically refers to a program, commercial or other video/audio content. Examples of metadata include: the name of a program, the length of a segment or an ISCII code associated with a commercial.
Collectively, video, audio, subtitles, AFD and other things people consume as part of a program are known as essence. Just remember that essence refers to the stuff we are watching when we view a program. Importantly, essence includes data (subtitles, teletext, AFD and so on). If we want to talk about data that is part of the program, we talk about data essence. Data essence is not metadata; data essence is part of the program that is intended for “viewing” by the end consumer.
Technical vs. Descriptive
Metadata tends to be treated as a single topic. But, in our mind, there are several different ways to classify metadata. One way to look at metadata is whether it is technical or descriptive. Technical metadata is metadata that is required to identify and play back essence properly. Examples include: a Unique Media Identifier (UMID), information about compression used and raster size.
Descriptive metadata is typically the sort of information operators in professional media organizations care about — program title, show number, kill date, ISCII code and so on. Descriptive metadata might also include things like shot notes, scripts and music scores.
Some metadata falls into a grey area. How would you classify GPS location, zoom and focus setting, crank rate, or robotic camera position? Is this technical metadata or descriptive metadata? Generally speaking, technical metadata is directly related to the essence or essence coding and does not include these data types.
Another way to classify metadata is to think about how frequently the metadata is expected to change. We would expect some metadata to remain constant for the duration of the content being viewed. For example, we would not expect the title to change halfway through a program. In the United States, we might expect that the technical metadata regarding frame rate would remain at 59.97Hz for the duration of the program. As you can see from these examples, the frequency of metadata change is an orthogonal classification (meaning it is completely unrelated) to the metadata type (technical or descriptive). Not surprisingly, metadata that remains constant is called static metadata.
Other metadata, such as timecode, would be expected to increase monotonically (increase by a count of one, only) on every frame. Timecode and other metadata may, therefore, be placed on a timeline that increases predictably, and should never decrease (run backwards) or unexpectedly jump ahead. This type of metadata might be called timeline-related metadata.
Finally, metadata associated with shot logging, describing actors in scenes or associating a music score to a section of a program may appear fixed for a period of time, may be duplicated in several locations during a program or may even overlap with other similar types of metadata. This metadata can be referred to as event metadata.
There is one other important family of metadata, and that is perpetual metadata. As you can guess, this is metadata that should never change. Systems rely on the fact that these metadata items are immutable. If something is immutable, it means that it cannot be changed after it is created. Examples of this include a unique ID and the location where a program was filmed. Remember, systems will count on the immutability of certain metadata. If you violate the rules, all bets are off.
Why does all this matter? Because, in the end, media professionals build and operate systems, and these systems need to be properly designed to handle all of these different types of metadata.
Metadata and Essence
How do you tie timecode or program title to a specific copy of a movie? In the tape world, the question was irrelevant. The timecode was either stored in the VBI or in the ANC data, and the program title was printed on a label on the box or the tape itself.
But, things have changed in the file-based world. Associating metadata, especially technical metadata, to a particular program can be critical. Professional file formats must contain not only video and audio, but also the technical metadata necessary to play back the content. When it comes to descriptive metadata, the door was left open as to exactly what types of descriptive metadata were to be included in the file. This is both good and bad. One of the most critical linking mechanisms between metadata and essence is the UMID. The UMID is a unique identifier that is used to identify a particular piece of content.
UMIDs and the Contract
It is important to understand that there is an implicit contract between systems that create content labeled with a UMID and systems that later manipulate the content or rely on the UMID to retrieve the content. Here is the contract: A system will never break the connection between a particular piece of content and the UMID that was originally associated with that content.
If you create a new piece of content, give it a new UMID. Simple, right? But, here is a twist: What if you create a bit-for-bit exact copy of the file? The rule is that you duplicate everything, including the UMID. But, now you have two copies of the content with the same UMID. Is this right? The answer is yes. And, if you ask a media asset management system to retrieve a piece of content using a UMID, you cannot say for certain which copy you will get. The particular instance of a copy of content is not uniquely identified by a UMID. For other reasons, a MAM system may uniquely identify each copy of the same content, but the UMID should not be used to do this.
In a file-based world, metadata becomes key. Understanding how unique identifiers work and treating them correctly is vitally important. Fortunately, manufacturers hide most of this from end users, so you may not even be aware that UMIDs exist.
By Brad Gilmer, Broadcast Engineering
A nice video introduction to IMF by Bruce Devlin.
The MSU Graphics & Media Lab has released its eighth H.264 video codecs comparison. The main goal of this report is the presentation of a comparative evaluation of the quality of new H.264 codecs using objective measures of assessment.
The comparison was done using settings provided by the developers of each codec. The main task of the comparison is to analyze different H.264 encoders for the task of transcoding video—e.g., compressing video for personal use. Speed requirements are given for a sufficiently fast PC; fast presets are analogous to real-time encoding for a typical home-use PC.
The overall ranking of the software codecs tested in this comparison is as follows:
- DivX H.264
- Intel Ivy Bridge QuickSync
- MainConcept CUDA
This rank is based only on the encoders’ quality results. Encoding speed is not considered here.
x264 HDTV “High Quality” Settings
--tune ssim --pass 1 --keyint 500 --preset slow
--tune ssim --pass 2 --keyint 500 --preset slow
Source: MSU Graphics & Media Lab
Friday, June 22, 2012
This article connects the dots among Serial Digital Interface (SDI), image compression, the invention of the Advanced Authoring Format / Material eXchange Format (AAF / MXF) data model, the MXF wrapper format, the subsequent development of AS-02, AS-03 and other MXF application specifications, developments in high-speed networking technology and network security, the SMPTE 2022 Standard for Professional Video over IP transmission, the recent activities of the Hollywood-based ETC’s Interoperable Mastering Format (IMF), which has recently moved into SMPTE, and the Advanced Media Workflow Association / European Broadcasting Union (AMWA / EBU) Task Force on the Framework for Interoperable Media Services (FIMS), concentrating on service-oriented media workflows.
An atom is a self-contained data unit that contains information about an MP4 file. The moov atom, also referred to as the movie atom, defines the timescale, duration, display characteristics of the movie, as well as subatoms containing information for each track in the movie. The optimal location of the moov atom depends on the selected delivery method.
MPEG-4 Stream Packaging
For Flash Player to be able to play back an MPEG-4 (MP4) file, the file must be packaged in a specific type of container—one that follows the MPEG-4 Part 12 (ISO/IEC 14496-12) specification. Stream packaging is the process of making a multiplexed media file. Also known as muxing, this procedure combines multiple elements that enable control of the distribution delivery process into a single file. Some of these elements are represented in self-contained atoms.
As mentioned at the outset, an atom is a basic data unit that contains a header and a data field. The header contains referencing metadata that describes how to find, process, and access the contents of the data field, which may include (but is not limited to) the following components:
- Video frames
- Audio samples
- Interleaving AV data
- Captioning data
- Chapter index
- User data
- Various technical metadata: codec, timescale, version, preferred playback rate, preferred playback volume, movie duration, etc.
In an MPEG-4–compliant container, every movie contains a moov atom. Normally, a movie atom contains a movie header atom (an mvhd atom) that defines the timescale and duration information for the entire movie, as well as its display characteristics. The movie atom also contains one track atom (a trak atom) for each track in the movie. Each track atom contains one or more media atoms (an mdia atom) along with other atoms that define other track and movie characteristics.
In this tree-like hierarchy, the moov atom acts as an index of the video data. It is here that the MPEG-4 muxer stores information about the file to enable the viewer to play and scrub the file. The file will not start to play until the player can access this index.
Unless specified otherwise, the moov atom is normally stored at the end of the file in on-demand content, after all of the information describing the file has been generated. Depending on the type of on demand delivery method selected—progressive download, streaming, or local playback—the location will need to move either to the end or to the beginning of the file.
If the planned delivery method is progressive download or streaming (RTMP or HTTP), the moov atom will have to be moved to the beginning of the file. This ensures that the required movie information is downloaded first, enabling playback to start right away.
If the moov atom is located at the end of the file, it forces the download of the entire file first before it will start playback. If the file is intended for local playback, then the location of the moov atom will not impact the start time, since the entire file is available for playback right away.
The placement of the moov atom is specified in various software packages through settings such as "progressive download," "fast start," "use streaming mode," or similar options. Software packages such as MP4creator or AtomicParsley enable you to analyze the location of the moov atom in your encoded files (see Figures 1 and 2).
Some tools enable relocation of the moov atom to the beginning of the file's structure through post processing of the compressed MPEG-4 (MP4) file. One such tool is MP4Creator, mentioned earlier, and another is MP4 FastStart. The best way to handle the moov atom location, however, is to set it during the compression and muxing portion of the encoding process. This minimizes the probability of the moov atom inadvertently being placed at the end.
Issues with edts Atom Handling
An edts atom contained in the trak atom of a moov atom located within an MP4 container hierarchy is responsible for tracking times and durations of the media. Flash Player architecture is designed to ignore the existence of an edts atom; however, an edts atom containing invalid or broken data may interfere with smooth and stable switching of HTTP packaged streams. Therefore, it is important to repair or remove an invalid edts atom prior to packaging the file for HTTP Dynamic Streaming.
The broken edts atom can be eliminated from a file using tools such as FLVCheck for file conformance, MP4Creator for structure analysis, and AtomicParsley for removal of metadata (see Figures 3 and 4).
The AtomicParsley command responsible for removing atoms is represented in the following string:
AtomicParsley.exe filename.mp4 --manualAtomRemove "moov.trak.edts" --manualAtomRemove "moov.trak.edts" --overWrite
Here, filename.mp4 is the name of the file being processed and --manualAtomRemove is a command that initiates the removal of the specific atom, edts, which is hierarchically located within the trak atom, which is within the moov atom. If the file contains more than one trak atom, such as audio and video media elements, then the track number is added to the "moov.trak.edts" as shown above. By default, AtomicParsley removes the atom from the first moov atom track. Adding the next track number in sequence, or a track number of your choice, forces AtomicParsley to proceed to that atom number next (for example, moov.trak.edts). To edit all track numbers, repeat the command for each track. Adding the command string --overWrite overwrites your original processed file.
- AtomicParsley (command-line program for reading and parsing MPEG-4 files)
- FLVCheck (MPEG-4 checker)
- MP4Box (multimedia packager)
- MP4Creator (MPEG-4 file creater and modifier)
- MP4 FastStart (starts MPEG-4 progressive downloads immediately)
- QTIndexSwapper (starts H.264 progressive downloads immediately)
By Maxim Levkov, Adobe
This document has been prepared by the EBU ‘BeyondHD’ Project that is part of the strategic programme on Future Television Systems. It is intended for EBU Members’ senior management and provides a general technical overview about future TV formats.
Friday, June 08, 2012
To get it out of the doldrums it remains in, 3D needs as much help as possible and the latest leg-up has been given by the International Telecommunication Union (ITU) the United Nations agency for information and communication technology.
The ITU has drafted a series of recommendations, submitted to its Administrations for accelerated approval, on 3DTV that are intended to promote the further use of this format worldwide and which the ITU hopes will provide much needed tools to evaluate, make, and exchange 3DTV programmes. The ITU’s Radiocommunication Sector (ITU-R) has developed the standards in collaboration with experts from the television industry, broadcasting organisations and regulatory institutions in its Study Group 6.
In detail, the new ITU-R Recommendations focus on 720p and 1080i/p HDTV 3DTV programme production and broadcasting with recommendations also agreed on the digital interfaces used in studios for 3DTV programme production, and on the general requirements for 3DTV. The ITU-R Study Group 6 also agreed a Recommendation for the methods to evaluate the quality of 3DTV images, which relates to three aspects, or quality factors: picture quality, depth, and comfort levels.
Source: RapidTV News
Friday, June 01, 2012