Video systems generate a vast volume of picture data, but it is not economically sensible for humans to review and evaluate all of it. Usually on sites where lots of video cameras are used, they have to be monitored by just a few operators. Watching all cameras continuously is just not an effective use of manpower because no human being can give their full attention to lots of images at once and operators inevitably tire quickly. Soon, lapses in concentration mean important details are overlooked and the whole security concept is significantly undermined. Though there may be a semblance of security, the reality is different.
To rectify this, relevant information needs to be filtered out of the mass. And thankfully, analysis algorithms are perfectly able to continuously examine as much picture content as you like and to automatically draw the operator's attention to critical situations. Algorithms and processors don't get tired and they work automatically in the background 24 hours a day, 7 days a week – just like a perfect assistant.
Clearly any system must not generate lots of alarms without good reason, or operators will cease to treat them as critical situations and will just cancel the alarms without checking them out properly. Then the result is just the same as without video analysis when the operators lose concentration, real alarm situations are easily overlooked and the system is only actually providing a superficial impression of security.
Defining the Task
So if video analysis is to be used efficiently and reliably, it is vital to have a clear definition of a so-called 'critical situation' and a perfectly matched algorithm. As with managing a human assistant in any industry, clear instructions and the right capabilities are needed to ensure that a good job is done and the boss's expectations are fulfilled. If tasks are not defined precisely enough, or require skills which the assistant does not possess, then that's where problems start.
In the video system market in recent years many marketing campaigns have featured very sweeping statements about the capabilities of so-called 'intelligent' video analysis. As a result many users expect analysis algorithms to have human intelligence. They expect that if they themselves can see that it's a person crawling there and not a dog on all fours, then surely so must an algorithm which is supposed to be able to differentiate between humans and animals?! — If only it were that easy!
In fact, most algorithms use relatively basic criteria to differentiate, for example, vehicles and people from other moving objects and from each other. They first look for movement in the picture, and if there is a group of pixels which are close to each other and consistently (ex: in several consecutive pictures) moving from one area of the picture to another, then the algorithm assumes that these pixels belong to a coherent object. It then checks criteria such as the relative dimensions of the object. If the object is taller than it is wide and approximately the size of a person: then it decides it must be a person and responds accordingly. If it is supposed to generate an alarm then that's what it does.
But in order to make this logical deduction the algorithm also has to have been told how big a person would be in the image when he is in the foreground and how big he would be in the background. This is done during setup when service engineer 'measures' the scene to determine how wide the foreground and background are. A human operator who observes the scene does not need details like this to be defined explicitly. He knows simply from the context.
Classification algorithms like this, very simple ones, are perfectly adequate for some applications. However, in situations where it is likely that people may not approach the site in an upright position (precisely because they don't want to be discovered), then this type of algorithm is completely unsuitable and a security risk.
Nowadays there is a large choice of algorithms which can undertake a wide variety of tasks. To provide really reliable assistance, the chosen algorithm needs to be suited to its particular task and must be set up perfectly. The more complex its computing process, and the faster its response time, then the greater the computing capacity it will require.
Centralized or Decentralized
Many camera manufacturers already offer algorithms built into their cameras. These support decentralised video analysis conducted out at the edge of the video system rather than in the main server. This arrangement has the great advantage that it is uncompressed image data which is analysed – the perfect raw material for reliable analysis. This so-called 'analysis at the edge' also saves bandwidth and reduces demands on computing capacity in the central computer.
When network cameras are used and their images analysed centrally, then all the data is compressed first before being fed into the network and sent to the server which decompresses and analyzes it. Unfortunately this can result in analysis performance being impaired by compression artefacts. And it also inevitably means that the network is loaded with picture data which doesn't contain any important information and whose decompression makes additional demands on central server computing power.
But centralized analysis also has its advantages. It offers more flexibility in relation to the algorithms employed, often essential for specialist tasks; and it demands less computing power in the cameras, which may mean cheaper cameras can be used.
In designing a centralized system the scalability of the computing power is important. Ideally, you should be able to run the analysis algorithm on the central video management server as well as on the dedicated analysis computers which communicate with the central management server.
The best solution is often a combination of different architecture types. Users might want to use pre-analysis in the camera first to look for movement (nowadays this function is offered in almost every network camera in almost every price category), then send selected footage to the central evaluation computer which then examines these pictures using a specialized or more complex algorithm. This second process filters out irrelevant motion and reports relevant situations reliably. Assuming that there isn't constant movement in all of the camera scenes, then several cameras can share the network bandwidth and the computing capacity of the central server.
For many applications it may be beneficial or even necessary to combine different algorithms on the same camera channel. This is the approach used by the so-called 'dual sensors', which run two different detection processes in parallel and only generate an alarm if both detect a critical situation at the same time. Here the detection processes are based on different analysis principles which react to different disturbance factors. By combining two algorithms into a single alarm system more benign disturbance factors are excluded than when a single algorithm is used on its own.
On the other hand it may be necessary to use different specialist algorithms in the same scene, for example to reliably detect critical patterns of movement, and to accurately recognize abandoned objects. Since the two applications make different demands of the algorithm, it makes sense to use two different specialist algorithms to get the best results.
Video analysis is a sensible addition to modern video security systems particularly where it relieves the operator by providing efficient assistance. However, as so often in the security market, questions like "Which algorithm is the best?", or "In the camera or in the central server?" have to be answered with: "It just depends!"
So make sure you get plenty of expert advice. Insist on tests using video footage from your own real application and select a system that is flexible in supporting different structures and algorithms, both centrally and in the camera, as well as different algorithms in parallel on the same channel. Your best choice is a system which is scaleable and able to accommodate changing requirements and future developments. After all you never know, one day video analysis algorithms might really be 'intelligent!'