Friday, August 22, 2014

Fast, High Quality Frame-Accurate Video Trimming in MP4


Some popular video editing is to trim a video and extract the interesting portion. There are many different ways to accomplish video trimming.  

If we look at the video metadata in MP4, we can find that each GOP starting from a sync sample is self-contained. Each GOP can decode perfectly. One way is to seek to the nearest sync sample right before the starting point of the interesting portion and start decoding and re-encoding, till the ending point of the interesting portion. That is, decode a set of GOPs covering the interesting portion and re-encode the interesting portion only.  

It can go beyond the straightforward method to further optimize video trimming. Since each GOP can decode perfectly, the GOPs fully covered by the interesting portion don’t need to be decoded and re-encoded. Only the leading GOP and trailing GOP partially covered by the interesting portion need to be decoded and re-encoded.  

For example, the whole video has the length of 30 seconds, and GOP duration is 1 second. We want to trim the video from 15.5 second to 25.5 second. The GOPs from 16th second to 25th second do not need decoding and re-encoding, but directly pass through and save into the final trimmed video. Only the GOP from 15th second to 16th second, and the GOP from 25th second to 26th second need to be decoded and re-encoded.  

To minimize the number of pictures to be decoded from 15th to 16th second, and from 25th to 26th second, if the picture’s time stamp is outside the interesting portion of 15.5-25.5 second but inside 15th-16th second or 25th - 26th second, and the picture is not used for reference, the decoding of that picture can be skipped, because it has no impact on the pictures in the interesting portion.  

As for the sequence parameter set (SPS) and picture parameter set (PPS), if different IDs can be used for re-encoding portions compared to those for the pass-through portion, the set of SPS and PPS for re-encoded portions can be saved together with those for the pass-through portion in ‘avc1’ box.  If the IDs have to be the same between the two sets of SPS’s and PPS’s, ‘avc3’ box can be used instead of ‘avc1’. ‘avc3’ allows to store SPS and PPS together with video sample data. If ‘avc1’ has to be used and the IDs have to be the same, then let SPS and PPS still be stored with video sample data with minor non-conformance. That is, use ‘avc1’ but still store SPS and PPS together with sample data. All MP4 parser handles the minor non-conformance.  

This way is the fastest frame-accurate video trimming at the highest quality, for videos in MP4, since the minimum set of pictures are decoded and re-encoded and most pictures are untouched, which keeps the original quality without degradation of re-encoding.

No comments:

Post a Comment