Some popular
video editing is to trim a video and extract the interesting portion. There are
many different ways to accomplish video trimming.
If we look at the
video metadata in MP4, we can find that each GOP starting from a sync sample is
self-contained. Each GOP can decode perfectly. One way is to seek to the
nearest sync sample right before the starting point of the interesting portion
and start decoding and re-encoding, till the ending point of the interesting
portion. That is, decode a set of GOPs covering the interesting portion and
re-encode the interesting portion only.
It can go beyond
the straightforward method to further optimize video trimming. Since each GOP
can decode perfectly, the GOPs fully covered by the interesting portion don’t
need to be decoded and re-encoded. Only the leading GOP and trailing GOP
partially covered by the interesting portion need to be decoded and re-encoded.
For example, the
whole video has the length of 30 seconds, and GOP duration is 1 second. We want
to trim the video from 15.5 second to 25.5 second. The GOPs from 16th
second to 25th second do not need decoding and re-encoding, but
directly pass through and save into the final trimmed video. Only the GOP from
15th second to 16th second, and the GOP from 25th
second to 26th second need to be decoded and re-encoded.
To minimize the
number of pictures to be decoded from 15th to 16th
second, and from 25th to 26th second, if the picture’s
time stamp is outside the interesting portion of 15.5-25.5 second but inside 15th-16th
second or 25th - 26th second, and the picture is
not used for reference, the decoding of that picture can be skipped, because it
has no impact on the pictures in the interesting portion.
As for the
sequence parameter set (SPS) and picture parameter set (PPS), if different IDs
can be used for re-encoding portions compared to those for the pass-through
portion, the set of SPS and PPS for re-encoded portions can be saved together
with those for the pass-through portion in ‘avc1’ box. If the IDs have to be the same between the
two sets of SPS’s and PPS’s, ‘avc3’ box can be used instead of ‘avc1’. ‘avc3’
allows to store SPS and PPS together with video sample data. If ‘avc1’ has to
be used and the IDs have to be the same, then let SPS and PPS still be stored
with video sample data with minor non-conformance. That is, use ‘avc1’ but
still store SPS and PPS together with sample data. All MP4 parser handles the
minor non-conformance.
This way is the
fastest frame-accurate video trimming at the highest quality, for videos in MP4,
since the minimum set of pictures are decoded and re-encoded and most pictures
are untouched, which keeps the original quality without degradation of
re-encoding.