The desire for increased security has catapulted the popularity of video surveillance systems. The many problems associated with traditional, analog systems are influencing the push for digital systems. Furthermore, with the growing use of computer networks, semiconductors and video compression technologies, next-generation video surveillance systems will undoubtedly be digital and based on standard technologies and IP networking.
Video surveillance systems are fundamental in providing the ability to ensure security in today's security-conscious environment. As security risks increase, the need to visually monitor and record events has brought about diverse use models. Today, the video surveillance industry uses predominately analog CCTV cameras and interfaces as the basis of surveillance systems. These system components are not easily expandable and have low video resolution with little or no signal processing.
New architectures must provide scalability for cost-effective solutions across an increasingly diverse set of video surveillance system requirements. "They will no longer be simply surveillance camera systems," said Paul Evans, Marketing Manager of Xilinx, "but also video communication systems." The next generation of video surveillance systems will replace analog CCTVs with newer digital IP/LAN cameras, complex image processing and video-over-IP routing. "Time-to-market pressures, new codec standards and a widening range of requirements (including motion detection, advanced object detection and object tracking) are just a few of the challenges for new video surveillance architectures," Evans pointed out. With these challenges comes a need for implementations that can be scaled to a varying range of performance.
The IP-based structure of new surveillance systems allows for scalability, flexibility and cyber security. "Various encoding and decoding standards transport video streams, and MPEG-4 is the most widely adopted standard today, with a market share of over 40 percent," said Evans. Aside from the codec function, image preand post-processing enhances picture quality in real time with low latency. Programmable logic with embedded DSP blocks, memories, interfaces and off-the-shelf IP solutions allows a designer to meet new system requirements.
In a video surveillance system over IP (VSIP), hardware handling network traffic is an integral part of the camera. This is because video signals are digitalized by the camera and compressed before being transmitted to the video server to overcome the bandwidth limitation of the network. "A heterogeneous processor architecture, such as DSP/GPP, is desirable to achieve maximum system performance," said Zhengting He, Software Application Engineer of Texas Instruments. Interrupt intensive tasks such as video capturing, storing and streaming can be partitioned to GPP while the MIPS intensive video compression is implemented on the DSP. After the data is transferred to the video server, the server then stores the compressed video streams as files to a hard disk, overcoming traditional quality degradation associated with analog storage devices. According to He, various standards have been developed for compression of digital video signals, which can be classified into two categories:
Motion estimation (ME)-based approaches: Every N frames are defined as a group of pictures (GOP) in which the first frame is encoded independently. For the other (N-1) frames, only the difference between the previously encoded frame(s) (reference frame(s)) and itself is encoded. Typical standards are MPEG-2, MPEG-4, H.263 and H.264.
Still image compression: Each video frame is encoded independently as a still image. The most well-known standard is the JPEG. MJPEG standard encodes each frame using the JPEG algorithm.
ME versus Still Image Compression
Figure 1 shows the block diagram of an H.264 encoder. Similar to other ME-based video coding standards, it processes each frame macroblock (MB) by macroblock, which is 16x16 pixels. It has a forward path and reconstruction path. "The forward path encodes a frame into bits; the reconstruction path generates a reference frame from the encoded bits," explained He. Here (I)DCT stands for (inverse) discrete cosine transform and (I)Q stands for (inverse) quantization. ME and MC stand for motion estimation and motion compensation, respectively.
In the forward path (DCT to Q), each MB can either be encoded in intra mode or inter mode, said He. In inter mode, the reference MB is found in previously encoded frame(s) by the ME module. In intra mode, M is formed from samples in the current frame. The purpose of the reconstruction path (IQ to IDCT), added He, is to ensure that the encoder and decoder will use the identical reference frame to create the image. Otherwise, the error between the encoder and decoder will accumulate.
Figure 2 is a JPEG encoder block diagram. It divides the input image into multiple 8x8 pixel blocks and processes them one by one. "Each block passes through the DCT module first; then the quantizer rounds off the DCT coefficients according to the quantization matrix," He explained. The encoding quality and compression ratio is adjustable depending on the quantization step. The output from the quantizer is encoded by the entropy encoder to generate a JPEG image.
Since sequential video frames often contain a lot of correlated information, ME-based approaches can achieve a higher compression ratio. For example, for NTSC standard resolution at 30 frames per second, the H.264 encoder can encode video at 2 megabits per second to achieve average image quality with a compression ratio of 60:1. To achieve similar quality, MJPEG's compression ratio is about 10:1 to 15:1. MJPEG still has some advantages, according to He, over the ME-based approach, and that is why this compression technique is being used by about 30 percent of the market. Foremost, JPEG requires significantly less computation and power consumption. Also, most PCs have the software to decode and display JPEG images. MJPEG is also more effective when a single image or a few images record a specific event, such as a person walking across a door entrance. If the network bandwidth cannot be guaranteed,
MJPEG is preferred since the loss or delay of one frame will not affect other frames. "With the ME-based method," compared He, "the delay/loss of one frame will cause the delay/loss of the entire GOP since the next frame will not be decoded until the previous reference frame is available."
Since many VSIP cameras have multiple video encoders, users can select to run the most appropriate one based on the specific application requirement. "Some cameras even have the ability to execute multiple codecs simultaneously with various combinations," said He. "MJPEG is typically considered to be the minimal requirement, and almost all VSIP cameras have a JPEG encoder installed."
In a typical digital surveillance system, video is captured from the sensor, compressed and then streamed to the back-end recording device or video server. "It is undesirable," cautioned He, "to interrupt the video encoder task implemented on modern DSP architecture since each context switch may involve large numbers of register saving and cache throwing." Thus, the heterogeneous architecture is ideal where video capturing and streaming tasks can be offloaded from the DSP. The block diagram below illustrates an example of DSP/GPP processor architecture used in video surveillance applications.
"When implementing MJPEG on a DSP/GPP system-on-a-chip (SoC)-based system," said He, "developers should first partition the function modules appropriately to achieve better system performance."
The EMAC driver, TCP/IP network stack and HTTP server- -which work together to stream compressed images to the outsideand the video capture driver and ATA driver should all be implemented on the ARM to help offload DSP processing. "The JPEG encoder should be implemented on the DSP core," explained He, "since its VLIW architecture is particularly good at processing this type of computation-intensive task." Once the video frames are captured from the camera via the video input port on the processor, the raw image is compressed by exercising the JPEG encoder, and then the compressed JPEG image files are saved to the hard disk on the board.
Typically, PCs are used to monitor a video scene in real time by retrieving the streams in the recording device or video server and decoding and displaying them on the monitor. Encoded JPEG image files can be retrieved on the board via the Internet. Multiple streams can be monitored on a single PC; the streams can be also watched simultaneously from multiple points in the network. "As a huge benefit over traditional analog systems," said He, "the VSIP central office can contact the video server through the TCP/IP network and can be physically located anywhere in the network." The single point of failure becomes the digital camera, not the central office. "The quality of the JPEG images can also be dynamically configured to meet varying video quality specifications," He clarified.
"DSPs are programmable and offer maximum flexibility support, product differentiation, standard and custom product requirements, future upgradeability, multi-codec support, multi-stream support, and single- and multi-channel support," said Yvonne Lee, Video Surveillance Marketing Manager of Texas Instruments. "Our DSPs offer software-compatible portfolio from less than US$10 to more than 1-GHz performance." DSPs also enable important, non-standard algorithms such as video analytics and other new, innovative software functions. "TI has a vast partner network of more than 600 companies, supporting our DSP software, hardware and system solutions," said Lee, who also mentioned that there are complete solutions supporting video security applications such as intelligent IP cameras, DVRs, digital video servers, video routers, etc.
Lee went on to say that application-specific integrated circuits (ASICs) and many application-specific standard products (ASSPs) areby definitionof fixed functions, and therefore not programmable, which severely limits their flexibility. "They typically are not future-proof or cannot be updated by software as upgrade typically requires new ASIC/ASSP development," Lee explained. "They do not typically have a family of current devices in various price/performance points that are compatiblethey have, for example, fixed channel density."
Some think, however, that flexibility is not the No. 1 concern in video surveillance, and ASICs have more stable, better thermal performance required by this industry where 24x7 smooth operation is the norm.
Advances in media encoding schemes are enabling a broad array of applications, including DVRs, network surveillance cameras, medical imaging, digital broadcasting and streaming set-top boxes. The promise of streaming media presents a series of implementation challenges, especially when processing complex compression algorithms such as MPEG-4 and MPEG-compressed video transcoding.
In a DVR system, described Evans of Xilinx, multiple analog CCTV cameras route to a central video switching hub for storage, scaling, image processing and display. Video resolution and quality are typically low to reduce complex compression and cost. Special processing, such as motion detection, reduces the amount of storage space in the central hub. "This architecture is therefore not flexible or readily expandable," Evans reasoned, "and video monitoring is limited in terms of quality and quantity." Advanced digital video compression is being rapidly adopted in DVR found in video surveillance systems. There are many different standards for video data compression, with the most popular including JPEG, MJPEG, MPEG, Wavelet, H.263 and H.264.
A typical DVR system combines either an internal or external video matrix switcher to route the video from cameras to monitors. "This type of system requires multiple inputs and output multiplexing, making it very suitable for using programmable logic for system flexibility and expandability," said Evans. With most DVR manufacturers moving from MPEG-4 to H.264 high-definition (HD) codec, the need for better resolutions and compression rates increases.
The type of compression used has an impact on hardware system requirements, including memory, data rate and storage space. The H.264 standard offers substantial advantages in compression efficiency and is a key factor in the transmission of high-quality video over a bandwidth-limited network. For example, a color transmission at 30 frames per second at 640x480 pixels requires a data rate of 26 megabytes per second. This data rate must be reduced or compressed to a more manageable data rate that can be routed over a twisted pair of copper wires.
Current Solutions Offer Limited Scalability
With expanding resolutions and evolving compression standards, there is a need for high performance while keeping architectures flexible to allow for quick upgradeability. As technology matures and volumes increase, the focus is moving to cost reduction. System architecture choices include standard-cell ASICs and ASSPs, and programmable solutions such as DSPs or media processors and field-programmable gate arrays (FPGAs).
As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million, said Evans. Modern ASICs often include entire 32-bit processors, memory blocks including ROM, RAM, EEPROM, flash and other large building blocks. Such an ASIC is often termed an SoC. "Designers of digital ASICs use a hardware description language (HDL), such as Verilog or VHDL, to describe the functionality of ASICs," explained Evans.
FPGAs are the modern day equivalent of 7400 series logic and a breadboardaccording to Evanscontaining programmable logic blocks and programmable interconnects that allow the same FPGA to be used in many different applications. For smaller designs and/ or lower production volumes, FPGAs may be more cost-effective than an ASIC design. The non-recurring engineering cost (the cost to set up a factory to produce a particular ASIC) can run into hundreds of thousands of dollars. The general term ASIC includes FPGA, but most designers use ASIC only for non-field-programmable devices like standard cell or sea of gates.
However, as standards stabilize and volumes increase, it is important to have a solution with a low-cost migration path. "Often, this means either market-focused ASSPs or standard-cell custom ASICs," said Evans. However, the rising cost of custom silicon makes those solutions economically feasible in only the highest volume consumer applications. When adding up costs for masks and wafer, software, design verification and layout, the development of a typical 90-nanometer standard-cell ASIC can cost in excess of $30. Current vendors of DVR-bound ASICs include PentaMicro (MPEG-4), Vineyard, Techwell (MJPEG), A-Logics, Nextchip, Analog Devices, etc.
Therefore, when designing a lower-volume type of application, it is best to consider an FPGA, as it is unlikely an ASSP with the exact feature set required exists and even the best off-the-shelf solution is a high-risk choice due to the potential for obsolescence. The ideal surveillance architecture would need high performance, flexibility, easy upgradability, low development cost and a migration path to lower cost as the application matures and volume ramps. When technology rapidly evolves, architectures must be flexible and easy to upgrade. "This rules out standard-cell ASICs and ASSPs for those applications," said Evans.
Typically designed for very high-volume consumer markets, ASSPs are often quickly obsolete, making them an extremely risky choice for most applications. ASSPs are useful for high-volume applications, but lack flexibility, are costly to develop and have a longer development time. In addition, most advanced digital media ASSPs or processors can barely perform a H.264 HD decoding (and H.264 HD encoding is much more complex than decoding.).
Performance not only applies to compression, but also to pre- and post-processing functions. In fact, in many cases, these functions consume more performance than the compression algorithm itself. Examples of these functions include scaling, de-interlacing, filtering, and color-space conversion, said Evans. For video surveillance, the need for high performance rules out processor-only architectures. They simply cannot meet the performance requirements with a single device.
A state-of-the-art DSP running at 1 GHz cannot perform H.264 HD decoding or H.264 HD encoding, which is about ten times more complex than decoding. FPGAs are the only programmable solutions able to tackle this problem. In some cases, the best solution is a combination of an FPGA plus an external DSP processor. This FPGA co-processing approach can deliver significantly higher performance, since the designer can partition the system to take advantage of the benefits of each device.
As an example, a DVR design using the Texas Instruments DaVinci processor has only a single ITU-R BT656 video input port. "In typical video surveillance systems however," explained Evans, "multiple cameras generate video signals and a more efficient implementation is to time-multiplex two or more ITU-R BT656 data streams into a single VLYNQ data stream before sending it to the DaVinci processor."
This implementation allows you to use much fewer I/O pins for transporting the video stream and lowers system costs, as you can use a smaller package device. The FPGA receives the digital video in ITU-R BT656 format from the video decoders and outputs the processed video to a monitor for display, as well as to a digital media processor or DSP for compression and storing to the hard disk.
FPGAs are also finding their way into high-volume products such as DVRs and network surveillance cameras because of their flexibility in handling a broad range of media formats such as MPEG-2, MPEG-4, H.264 and Windows Media.
FPGAs' extremely high-performance DSP horsepower makes them suitable for other challenging video and audio tasks. By using FPGAs, you can differentiate your standard-compliant systems from other products and achieve the optimal balance for your application. For example, with the MPEG-4 compression scheme, it is possible to offload the IDCT (inverse discrete cosine transform) portion of the algorithm from an MPEG processor to an FPGA to increase the processing bandwidth. "IDCT (and DCT at the encoder) can be implemented extremely efficiently using FPGAs," said Evans, "and optimized intellectual property (IP) cores are readily available to include in MPEG-based designs." By integrating various IP cores together with the IDCT core, you can develop a low-cost, single-chip solution that increases processing bandwidth and gives higher quality images than ASSP-based solution.
Typically, a network surveillance camera product comprises three parts: a camera to convert the real-world image into a video stream, a video decoder for streams compressed into H.264, MPEG-2 or another format, and a video/image processor for de-interlacing, scaling and noise reduction before packeting digitized video for transmission over the Internet.
"FPGAs can have many areas of responsibility within surveillance cameras, bridging between standard chipsets as 'glue logic' has always been a strong application of FPGAs," continued Evans, "but many more image-processing tasks (such as color-space conversion), IDE (integrated drive electronics) interface and support for network interfaces (such as IEEE 1394) are now also commonly implemented in low-cost programmable devices." With high-performance DSP capability inside a network surveillance camera, you can digitize and encode the video stream to be sent over any computer network. You can use a standard Web browser to view live, full-motion video from anywhere on a computer network, including over the Internet. Installation is simplified by using existing LAN wiring or wireless LAN. "Features such as intelligent video, e-mail notification, FTP uploads and local hard-disk storage provide enhanced differentiation and superior capability over analog systems," said Evans.
FPGAs allow for rapid design, simulation, implementation and verification of video and image processing algorithms in video surveillance systems, including basic primitives and advanced algorithms for designing DVRs. In addition, FPGA solutions make possible a range of compression encoding, decoding and codec solutions, from off-the-shelf cores for those that need fast implementations to building-block reference designs and hardware platforms for those who want to differentiate their product through higher quality at lower bit rates. Using FPGAs for the extremely intensive processing in certain codec blocks means that you can support multi-channel HD encoding, save valuable system processor cycles, make substantial cost savings by reducing or eliminating DSP processor arrays, and easily integrate more features and capabilities into the systemfrom interfaces to further video processing.
Moreover, FPGAs offer a scalable solution, enabling support for different profiles, extra channels or new codec schemes in the same system. FPGAs can further reduce DVR system costs by consolidating system logic and implementing new peripherals. System interfaces can also be provided for the rapid development of video surveillance systems: advanced memory interfaces, PCI express, Texas Instruments VLYNQ and EMIF interfaces, hard-disk interfaces and ITU-R BT656 interfaces.
With FPGAs, you can easily build a highly flexible and scalable DVR system to address both the low- and high-end markets. You can easily connect numerous video streams from multiple cameras, through FPGAs, to the DaVinci processor. FPGAs are finding their way into surveillance applications because of their flexibility in handling a broad range of media formats such as MPEG-2, MPEG-4, H.264 and Windows Media. Their high performance makes them suitable for other challenging video and audio tasks.