Metadata

By now, most engineers have heard that metadata refers to data about data. As facilities move from tape to files, and from tight integration to service oriented-media workflows, it becomes critical for engineers to have a solid understanding of this important topic.

The definition of metadata as “data about data” may be accurate, but it is perhaps not entirely useful. Revising the definition a bit, in our industry, metadata typically refers to data about the video and audio — the essence of the program. System designers have found it useful to distinguish between essence and metadata. But just as things are starting to become clear, someone may ask, “What about closed captioning? Surely that is data — right? How about time code?”

When talking about data in relation to video and audio, it is important to ask yourself whether the data you are talking about is part of the program (essence), or whether it is descriptive information about the program (metadata). Using this as a guide, we can classify different elements of a program as either essence or metadata. (See Table 1 below.)


Essential Metadata vs. Descriptive Metadata
The types of metadata are practically limitless. Metadata can include identifiers, time code, geospatial data and even free-form user metadata such as field notes entered by a camera person. That said, there are two main classifications of metadata that can be useful. The first is Essential Metadata. Essential Metadata is data that is critical in order to play the content. Examples include unique identifier, frame rate, compression coding parameters and time code. The second is Descriptive Metadata. Descriptive Metadata describes the essence. Examples include house number, title, program length, sponsor or advertiser, and ISCI code.

It is useful to divide metadata into these two classifications because, in cases where data storage space is severely limited, it may be that only Essential Metadata is stored with the content. As we move further away from legacy videotape-oriented systems, storage space for metadata becomes less of a problem.

In Table 1, some of these classifications may seem arbitrary. To some extent, this is true. Is AFD essence data or metadata? Is a house number Essential Metadata or Descriptive Metadata? It may depend upon the systems involved. While there is no definitive answer to these questions, the concepts of essence, metadata, Essential Metadata and Descriptive Metadata can be helpful as you learn more about this topic.

Media Identification
One can argue that the most critical metadata component is the identifier assigned to the essence. After all, once the tape is converted to a file, there are only two ways to identify the content — either by the file name or by an identifier that is closely associated with the content. Gone is the trusty label stuck to the cassette.

In many facilities, file names are used to identify the content of a file, and this can work very well. But there are some problems with this approach. First, in many cases, nothing enforces file naming conventions beyond a policy that has been established by the company. This is good because changing your naming convention is as simple as sending a memo or an e-mail. But this is bad because you are tied to the file name limitations of the operating system.

Another issue is that the file name can be easily changed. Again, this can be a good thing because some workflows may rely on the name of the file being changed once a QC check or some other step has been performed on the file. But this could be a problem because the workflow process relies on humans to correctly type the file name and to correctly change the file name as the work progresses through your system.

Finally, the strongest argument for not using file names for identification is that there is no guarantee that the file name is unique. This can cause some major headaches, which I will get to later.

In many master control operations, content is identified by a house number. House numbers are typically assigned by the traffic or programming departments, they are internally generated, and they are used by various computer systems to identify content in the facility. House numbers may be incorporated into file names, and they may also appear within the metadata of the file itself.

Unfortunately, most house number systems suffer from the same limitations as file name systems previously discussed. In fact, a long time ago, a station where I worked used the same house numbers over and over. The promo for the Friday night movie, for example, was always 50555. This system worked well until we accidently promoted last week's Friday night movie because no one had replaced the old promo with the new one. Even though we knew there was a problem with our numbering system, we continued to use it until we accidently played a national automobile commercial during the wrong week. I do not remember the particulars, but I do remember some very uncomfortable meetings over the issue. Finally, we stopped reusing house numbers and went to a different system.

The MXF standard specifies that Unique Media IDentifiers (UMIDs) be used as labels within the MXF file to uniquely identify the content. UMIDs are computer-generated 16-byte strings that can be locally generated, meaning that you do not need any outside references to create the UMID. Statistically, UMIDs are almost guaranteed to be unique. This means that it is possible to uniquely identify a piece of media no matter where that content came from. One rub is that UMIDs are not meant to convey any information at all in and of themselves. For many media companies, it can be challenging to stop relying on the media ID as a way to convey information about the content. But file-based workflows rely on unique identifiers. In fact, it is a key assumption that the identifiers are unique.

The topic of media identification in metadata is an important one, and I expect you will see more on this topic as more companies move to file-based workflows.

Metadata Synchronization
Recently, the industry has spent a lot of time focusing on metadata contained in file wrappers. This is great, because without some consistency in how we treat metadata, interoperability at the file level is impossible. But one question facing system architects is how we can ensure metadata synchronization. Remember that metadata is not only contained in file wrappers, but it is also contained in databases that are used in many places in our facility. What should we do when metadata is modified? Should we always strive to ensure that the metadata in the file header matches what is contained in the database? Since it takes time to modify metadata in file headers, should we only modify the file header metadata to match the database when we export the file?

Perhaps the best approach would be to write the minimum metadata to the file header and keep all the rest of the nonessential metadata in the database. This is fine, but some media facilities want to be able to rebuild metadata contained in databases from the metadata contained in the files in the case of a database failure.

There are no easy answers. Metadata is a complex topic, and thinking on this topic is evolving. As you transition to file-based workflows, it is a good idea to spend time with your vendors, understanding how metadata is treated in a wide variety of scenarios.

By Brad Gilmer, Broadcast Engineering