Video compression: H.264 and improved implementations

Source: Submitted by Geutebruck
Date: 2010/12/10
Related tags:

For the security industry the major attraction of the H.264 standard is the prospect of high levels of compression and low storage costs. However, as a result of its multimedia heritage, the vast majority of H.264 implementations come with annoying drawbacks: inability to crawl backwards frame by frame, jerky images in fast forward and fast rewind, latencies and unnecessary costs. Although it is quite possible to produce an H.264 product without these negative side effects, very few have done it. One reason for this is that developing a video surveillance-friendly implementation involves a basic design rethink and some in-depth consideration of where and when, and what kind of data compression is necessary or desirable.

Like MPEG-2 and MPEG-4 before it, H.264 also uses differential compression. Whereas the earlier M-JPEG standard compresses each individual image in a video sequence independently from all other images, differential processes only consider the changes between one image and the previous and/or the following images. (See Figures 1 and 2.) This approach does drastically reduce the amount of data which has to be stored, but it means that for successful decoding, all the frames used for compression are also needed for decompression, i.e. the whole P-chain or group of pictures (GoP) beginning with the independent I-frame. If this GoP is not available in its entirety then compression errors or artifacts are produced, and if the chains are long, gaps of several seconds can result. (See Figure 3.)

The drawbacks of P-chains
In video surveillance, it is useful to be able to discard individual pictures from a sequence. Time lapse recording for instance uses selective discarding to save much more storage space than any compression process can. Yet with most H.264 implementations time lapse recording is simply not possible. In addition there are many monitoring situations where smooth live video is required for display, but one frame per second may be adequate for documentation purposes. With M-JPEG you can control live video and recording frame rates separately, but not with most H.264 products. Typical compromises get round this by either recording with a higher picture rate than necessary, or suffering a jerky live display with the same reduced picture rate as the recording. The paradoxical result of the first compromise is that despite using H.264, storage costs can be even greater than with M-JPEG.

Without the ability to discard frames video analysis may cost more. This is because the system load is minimized by matching the number of analyzed frames per second with the speed of the observed event. Hence for a wide angle camera where only relatively slow movement occurs in the scene, a handful of pictures per second may suffice to capture all relevant information. But with P-chain restrictions in force, the analysis channel still has to process 25 pictures per second — and processing five times the data inevitably means higher costs.

Ease of Use
In surveillance applications, ease of use is a major issue as it influences the effectiveness of the whole operation. Operators want to crawl forwards and backwards frame by frame, to run video fast forwards and fast back without losing track of the action, to view synchronized recordings from several channels at once to analyse an event from different angles. Yet P-chains here represent an annoying irritant at best and a security risk at worst, causing jumps during picture navigation and making video replay uncomfortable for users.

Ironically the overall effect of P-chains and the limitations they impose, is to lead systems to be bigger than necessary in order to ensure that appropriately high picture rates, qualities and resolutions are available if there is an alarm. This is surely a wasteful approach.

Other H.264 Structures
Yet, within the H.264 framework there are other ways of structuring the compression process which do not involve chains. For example, each P frame may be generated by only referring to the I frame. (See Figure 4.) This structure allows individual P frames to be discarded without affecting the decompression of other images in the GoP, but it is seldom used because it reduces compression efficiency. Closer examination though shows that any disadvantage is more than offset by gains in flexibility and the ability to employ other video surveillance cost-reduction processes such as time lapse recording, “fading long term memory” as well as independent control of display and recording rates. And, free from the constriction of P-chains, this kind of encoder can generate new I-frames at will, thus enabling video characteristics to be changed instantly and surveillance process latencies to be eliminated. Although still a tiny minority, products using this kind of structure do now exist.They include Geutebruck's MPEG4CCTV and its new H264CCTV, an H.264 implementation that Basler (www.basler-ip.com) is offering in their latest generation of IP cameras, and one from Stretch Inc which is fully supportive of such structures.

Join or Sign in

Video compression: H.264 and improved implementations