Understanding AVCHD

When DVCAM, DVCPRO and DVCPRO50 were introduced, manufacturers positioned these proprietary formats as “professional” compared to the “consumer” DV format. After working with all four formats, it became clear that differences were confined to their tape recording system. DV, DVCAM, DVCPRO and DVCPRO50 all use the same video codec. (DVCPRO50 employs dual 25Mb/s DV codecs.)

AVCHD, developed jointly by Panasonic and Sony, is a proprietary version of H.264/AVC. Specifically, AVCHD employs both the H.264 Main Profile (MP) and High Profile (HP). The HP codec provides important image quality advantages over the MP codec. Thus, although AVCHD is marketed as a single codec, it uses a pair of codec profiles. (The HP codec is downward compatible with the MP codec.) Moreover, although AVCCAM and NXCAM are marketed as professional formats, both use the same AVCHD HP codec. As you can see, understanding AVCHD, AVCCAM and NXCAM is more complex than understanding DVCAM, DVCPRO and DVCPRO50.

Figure 1 - HD H.264/AVC profiles and levels


Baseline Profile
The lowest profile used by an HD camera is BP. BP supports only the less efficient context-adaptive variable-length coding (CAVLC). Level 3.1 supports 720p30 at up to 14Mb/s, while Level 3.2 and Level 4.0 support 720p60 at up to 20Mb/s — although at such a low data rate, only 720p30 would be visually acceptable. Level 4.1 supports 720p60 at up to 50Mb/s.

Main Profile
MP offers the next performance level. MP supports both CAVLC and the more efficient context-adaptive binary-arithmetic coding (CABAC). MP also supports B-slices in addition to I- and P-slices. Because B data packets provide H.264 with its greatest encoding efficiency, MP decreases the probability of compression artifacts upon rapid motion. AVCHD uses MP and higher profiles.

A B-reference is generated when two motion vectors are defined from the displacement between the Current Block and Reference Blocks. With H.264, “bi” means two vectors — not two directions as it does for MPEG-2.

Several levels may be used with MP. Level 4.0 supports 720p59.94 and 1080i59.94 up to 20Mb/s (17Mb/s), while Level 4.1 supports data rates up to 50Mb/s (22Mb/s to 24Mb/s). The ability of Levels 4.0 and 4.1 to support 1080i59.94 means that 23.976fps can be recorded after applying 2:3 pulldown. This capability also means that 1080p29.97 can be recorded as 1080i59.94/29.97PsF because its frame rate is equal to the 29.97fps used by 1080i59.94.

High Profile
HP offers all the capabilities of MP (CABAC coding and B-slices) plus an optional capability that greatly improves codec efficiency — the ability to dynamically switch between 8 × 8 and 4 × 4 submacroblocks during compression. Image areas with high detail are compressed using 4 × 4 pixel blocks, while areas with low detail are compressed using 8 × 8 pixel blocks. The latter generates less data; therefore, more bandwidth is available for data from areas with fine detail.

During encoding, each 16 × 16 pixel macroblock is partitioned into four 8 × 8 submacroblocks and 16 4 × 4 submacroblocks. The encoder can switch among working with 16 × 16 blocks, 8 × 8 blocks and 4 × 4 blocks.

Figure 2


When predictions are made for 16 × 16 macroblocks, four modes are used:

Figure 3


When predictions are made for 8 × 8 submacroblocks, nine modes are used:

Figure 4


Canon AVCHD camcorders were the first to use HP H.264. Shooters quickly found MP software decoders were unable to decode Canon recordings.

An HP encoder supports 720p59.94 and 1080i59.94 using multiple levels. Level 4.0 supports data rates up to 20Mb/s (17Mb/s). Level 4.1, used by AVCHD, AVCCAM and NXCAM, supports data rates up to 50Mb/s (22Mb/s to 24Mb/s). Blu-ray employs Level 4.1 using a video data rate up to 40Mb/s.

Level 4.2, available in camcorders using AVCHD 2.0, supports a data rate up to 50Mb/s (28Mb/s) for 1080p59.94. When AVCHD is recorded on a DVD, the disc's maximum spin speed limits the data rate to 17Mb/s. Therefore, when you shoot either MP or HP Level 4.1, or HP Level 4.2, you will not be able to archive to a DVD.

GOP Structure
Each frame is encoded as one or more I-, P- and B-slices. Typically, every half-second, an H.264 encoder outputs an I-frame — a picture with all intra-encoded slices.

Audio Encoding
H.264/AVC encodes stereo audio using ACC or LPCM audio. AVCHD audio is restricted to AC-3 Dolby Digital 2.0 stereo or 5.1 surround. (NXCAM camcorders record un-compressed audio using PCM audio sampled at 48kHz.)

H.264/AVC I- and P-slice Encoding
One of the many characteristics of H.264/AVC that makes it difficult to understand is its use of terms similar to those used when discussing MPEG-2 — for example, “I,” “P” and “B.” An H.264 I-slice is a portion of a picture composed of macroblocks, all of which are based upon macroblocks within the same picture.

Thus, H.264 introduces a new concept called slices — segments of a picture bigger than macroblocks but smaller than a frame. Just as there are I-slices, there are P- and B-slices. P- and B-slices are portions of a picture composed of macroblocks that are not dependent on macroblocks in the same picture.

H.264 encoding begins by chroma downsampling to 4:2:0. Next, each incoming picture is divided into macroblocks. (When interlaced video is encoded, both fields are compressed together.) Many of the same techniques used to compress an MPEG-2 I-frame are used to compress macroblocks making up an I-slice. Each 16 × 16 pixel macroblock is further partitioned into four 8 × 8 submacroblocks. (See Figure 2.) The encoder can switch between working with 16 × 16 blocks and 8 × 8 blocks.

Blocks, of course, are located next to other blocks. For example, the Current Block (yellow) in the Figure 1 frame to be encoded has a block to the left (green) and a block above (blue). The latter two blocks are Previous Blocks. Reference Pixels are located at the left (dark green) and lower (dark blue) boundaries between Previous Blocks and the Current Block. Four different types of prediction methods (modes) are used with 16 × 16 macroblocks. (See Figure 3.)

When predictions are made for 8 × 8 submacroblocks, nine modes are used. (See Figure 4.)

In all cases, the mode that best predicts the content of the Current Block is selected as the Current Prediction Mode. The Current Prediction Mode is linked to the Current Block. Each Predicted Block (from the column and row of Reference Pixels) is “subtracted” from the Current block, thereby generating a Residual (difference) Block. Each Residual Block is compressed, linked to the Current Block, and during decoding used as a picture “correction” block.

Once an I-slice has been encoded, P-slices are encoded. Motion estimation is methodically performed, and macroblocks in other frames are searched for the contents of the Current Block. H.264 supports searching within up to five pictures before or after the current picture. (AVCHD supports searching within four pictures.) Obviously, the greater the number of reference pictures used, the greater the memory that must be in an encoder. For this reason, AVCHD cameras typically only support one or two reference frames.

The block with the best measured content match becomes a Reference Block. A P-reference is generated when only a single motion vector is defined by the displacement between Current and Reference Blocks. Each motion vector and each P-slice compressed Residual Block are linked to a P-slice.

BY Steve Mullen, Broadcast Engineering