Multimedia Computing: 2014

Friday, November 14, 2014

BSID 9 and 10 not in Dolby AC3 specification

AC3 specification is defined and owned by Dolby. Dolby provides AC3 SDKs for commercial use. However, the open source community has some open source AC3 codec and extends the AC3 codec by adding BSID 9 and 10, which leads to the creation of non-compliant bitstreams according to Dolby AC3 specification. Commercial Dolby AC3 decoder will not support those non-compliant streams from open source community.

Below are some details about BSID equal to 9 and 10 in AC3 from open source community,

“AC3/BSID9 and AC3/BSID10 (DolbyNet) :
The ac3 frame header has, similar to the mpeg-audio header a version field. Normal ac3 is defined as bitstream id 8 (5 Bits, numbers are 0-15). Everything below 8 is still compatible with all decoders that handle 8 correctly. Everything higher are additions that break decoder compatibility.
For the samplerates 24kHz (00); 22,05kHz (01) and 16kHz (10) the BSID is 9
For the samplerates 12kHz (00); 11,025kHz (01) and 8kHz (10) the BSID is 10”

Wednesday, November 12, 2014

Simple, Main and Advance Profiles in SMPTE VC-1 specification

The three profiles, simple, main and advanced profiles in SMPTE VC-1 specification actually act as two coding formats. One coding format is for simple and main profiles, and the other coding format is for advance profile.

There are lots of differences between simple and main profiles, and advanced profile.

1. The picture layer bitstream synatxes are different between simple and main profiles, and advanced profiles.

2. In the advanced profile, the sequence-related metadata is part of the video data bitstream. Instead, in the simple and main profiles, the sequence-related metadata shall be communicated to the decoder by the transport layer or other means out-of-band.

3. Interlace coding is only supported in advanced profile, but not in simple and main profiles.

4. Slices are only supported in advanced profile, but not in simple and main profiles.

5. In the advanced profile, pictures and slices shall be byte-aligned and carried in a BDU. Each new picture or a slice is detected via start-codes as defined in Annex E. In the simple and main profiles, for each coded picture, the pointer to the coded bitstream and its size shall be communicated to the decoder by the Transport Layer.

6. Simple and main profiles has certain assumptions made regarding the display environment (e.g. square pixel aspect ratio). Advanced profile adds extensive in-band metadata support and allows for optimized experiences on a wide range of display devices. That is, Annex I about display metadata is only for advanced profile.

In fact, when VC-1 bitstream is stored in ASF file format, the fourcc is ‘WMV3’ for simple and main profiles, but ‘WVC1’ for advanced profile, which indicates SMPTE VC-1 actually has two coding formats again.

Sunday, August 24, 2014

Motion-JPEG in AVI and Motion-JPEG in MOV

There is no document that defines a single standard format that is universally recognized as a complete specification of “Motion-JPEG” for use in all contexts. This raises compatibility concerns about Motion-JPEG video streams from different manufacturers.

Each particular file format usually has some specification how Motion-JPEG shall be encoded. There are two popular Motion-JPEG file formats, AVI and MOV.

Microsoft documents their standard format to store Motion-JPEG in AVI files at http://www.fileformat.info/format/bmp/spec/b7c72ebab8064da48ae5ed0c053c67a4/view.htm Motion-JPEG in AVI doesn’t use custom Huffman table. Instead, default Huffman table is always used. That is, each Motion-JPEG frame won’t have DHT marker.

Apple documents how Motion-JPEG is stored in QuickTime MOV files, with two types of coding, Motion-JPEG format A and Motion-JPEG format B at https://developer.apple.com/standards/qtff-2001.pdf Each field of Motion JPEG format A fully complies with the ISO JPEG specification. Motion-JPEG format B does not support markers. In place of the marker, therefore, QuickTime inserts a header at the beginning of the bitstream.

Saturday, August 23, 2014

H.264 in CENC version 1 and CENC version 2

CENC represents “The Common Encryption (‘cenc’) protection scheme”.

For H.264 in CENC version 1, it has,

“it is required that at least the NAL length field and the nal_unit_type field (the first byte after the length) of each NAL unit is left unencrypted.”

And for H.264 in CENC version 2, it has,

“For AVC video using ‘avc1’ sample description stream format, the NAL length field and the nal_unit_type field (the first byte after the length) of each NAL unit SHALL be unencrypted, and only video data in slice NALs SHOULD be encrypted. Note that the length field is a variable length field. It can be 1, 2, or 4 bytes long and is specified in the Sample Entry for the track as the lengthSizeMinusOne field in the AVCDecoderConfigurationRecord”

That is, H.264 in CENC version 1 only requires that NAL length field and the nal_unit_type field to be clear.

H.264 in CENC version 2 requires that NAL length field and the nal_unit_type field to be clear, and recommend slice header in slice NALs to be clear.

There isn’t an easy way to tell whether some H.264 content is encrypted with CENC version 1, i.e. only NAL length field and the nal_unit_type field are clear, or with CENC version 2, i.e. NAL length field, the nal_unit_type field and the full slice header are clear.

Friday, August 22, 2014

Fast, High Quality Frame-Accurate Video Trimming in MP4

Some popular video editing is to trim a video and extract the interesting portion. There are many different ways to accomplish video trimming.

If we look at the video metadata in MP4, we can find that each GOP starting from a sync sample is self-contained. Each GOP can decode perfectly. One way is to seek to the nearest sync sample right before the starting point of the interesting portion and start decoding and re-encoding, till the ending point of the interesting portion. That is, decode a set of GOPs covering the interesting portion and re-encode the interesting portion only.

It can go beyond the straightforward method to further optimize video trimming. Since each GOP can decode perfectly, the GOPs fully covered by the interesting portion don’t need to be decoded and re-encoded. Only the leading GOP and trailing GOP partially covered by the interesting portion need to be decoded and re-encoded.

For example, the whole video has the length of 30 seconds, and GOP duration is 1 second. We want to trim the video from 15.5 second to 25.5 second. The GOPs from 16^th second to 25^th second do not need decoding and re-encoding, but directly pass through and save into the final trimmed video. Only the GOP from 15^th second to 16^th second, and the GOP from 25^th second to 26^th second need to be decoded and re-encoded.

To minimize the number of pictures to be decoded from 15^th to 16^th second, and from 25^th to 26^th second, if the picture’s time stamp is outside the interesting portion of 15.5-25.5 second but inside 15^th-16^thsecond or 25^th - 26^th second, and the picture is not used for reference, the decoding of that picture can be skipped, because it has no impact on the pictures in the interesting portion.

As for the sequence parameter set (SPS) and picture parameter set (PPS), if different IDs can be used for re-encoding portions compared to those for the pass-through portion, the set of SPS and PPS for re-encoded portions can be saved together with those for the pass-through portion in ‘avc1’ box. If the IDs have to be the same between the two sets of SPS’s and PPS’s, ‘avc3’ box can be used instead of ‘avc1’. ‘avc3’ allows to store SPS and PPS together with video sample data. If ‘avc1’ has to be used and the IDs have to be the same, then let SPS and PPS still be stored with video sample data with minor non-conformance. That is, use ‘avc1’ but still store SPS and PPS together with sample data. All MP4 parser handles the minor non-conformance.

This way is the fastest frame-accurate video trimming at the highest quality, for videos in MP4, since the minimum set of pictures are decoded and re-encoded and most pictures are untouched, which keeps the original quality without degradation of re-encoding.

Thursday, August 21, 2014

PLUSPTYPE in H.263

PLUSPTYPE means an extended PTYPE, indicated by source_format syntax equal to ‘111’.

PLUSPTYPE is absent for short header mode of MPEG-4 Pt2 decoder, the H.263 compatible mode of MPEG-4 Pt2 decoder.

Though PLUSPTYPE can have custom resolution besides the limited 5 resolutions in short header mode of MPEG-4 Pt2 decoder, it is often not supported by commercial H.263 or MPEG-4 Pt2 decoders.

That is, when video contents are encoded in H.263 format, it’s better to not use PLUSPTYPE for compatibility, and be play-able on most devices.

Gradual Frame Rate Reduction in Video Playback on the Response to Resource Reduction

During video playback, for various reasons, available resources, such as CPU and memory, might be reduced. Video player might have to use less resources and still keep audio and video in synchronization.

One way is to reduce the frame rate of video playback. However, not all video contents can have gradual frame rate reduction during decoding process. For example,

1. If the GOP structure is IPBBPBB..IPBBPBB… with 30 frames per GOP, original frame rate is 30 fps, and B frame is not used for reference but I and P frames are used for reference, the frame rates available are 30 fps, ~20 fps, ~10fps and 1 fps for artifacts free playback. With those intermediate artifacts-free frame rates, in general enough resources can be freed.

2. However, if the GOP structure is IPPPP..IPPP… with 30 frames per GOP, original frame rate is 30 fps, and each frame is used for reference, the only frame rates available are 30 fps and 1 fps for artifacts free playback. The frame rate reduction is too drastic.

For video contents in case 1, video decoder can drop non-reference frames and reduce frame rate gradually to free enough resources. It can free resources for decoding of compressed pictures, processing and rendering of uncompressed pictures. However, for video contents in case 2, video decoder itself can’t achieve gradual frame rate reduction. Instead, it might have to reduce frame rate by dropping some decoded frames, which won’t reduce resources for decoding of compressed picture but only processing and rendering of uncompressed pictures.

To achieve optimized frame rate reduction, decoder need to detect whether the content is in case 1 or case 2. That is, decoder need to accumulate some statistics about the percentage of non-reference pictures in the video sequence, and application or system pipeline can make optimized decision based on the video content properties.

MPEG-4 Part 2 vs H.263

MPEG-4 Pt 2 has two parts, short header mode and non short header mode.

Short header mode is (baseline) H.263 compatible, with minimum set of resolutions, defined by source_format syntax, not including resolutions such as 320x240, 720x480.

Non short header mode is not H.263 compatible, and can code other resolutions beyond the minimum set of resolutions for baseline H.263.

Custom resolution support in H.263 some feature in H.263+ (H.263 v2).

Microsoft MPEG-4 Part 2 Video Decoder supports both short header mode (baseline H.263) and non short header mode, but not H.263+ and H.263++, up to Windows Blue.

Welcome!

We are a group of researchers and software engineers passionate about multimedia computing and have dozens of years of experience in this area, plus Ph.D.s from top US universities.

We'll also post other technical related posts.

Hope you enjoy the content here.