How to Achieve Video and Audio in Synchronization
Tevin Wang | Date:
- In order to achieve perfect synchronization, the audio and video packets must be time-stamped.
- One key in audio and video synchronization is the use of Real Time Protocols (RTP) that ensures that audio and video signals are received at the same time.
Many network cameras offer audio support, be it built-in microphones and/or mic-in/line-in jacks. Audio signals are transmitted across the network the same way video data is. In digital systems, audio and video data is sent in separate packets. In order to achieve perfect synchronization, the audio and video packets must be time-stamped. Video/audio synchronization happens at the bit level. “We closely control video and audio compression through the use of in-house developed hardware encoding. Controlling the encoding and decoding assists with the synchronization, as does adhering to recognized standards like RTP for delivery, rather than ‘reinventing the wheel',” said Karen McCarrison, Product Marketing Manager, IndigoVision.
Elena Ravello, Brand Manager for Fermax, agrees with the importance of well-recognized standards. “One key in audio and video synchronization is the use of Real Time Protocols (RTP) that ensures that audio and video signals are received at the same time. UDP protocol also helps to reduce the weight of the data inserted for the synchronization in the transmission.”
“In order to ensure that the video encoding is not affected by any other processing power requirements, each of our transmitters/cameras include two processors: a FPGA for video encoding and another processor for networking requirement s and onboard analytics,” McCarrison said.
Since chips are key components of surveillance cameras, compression efficiency is one of our evaluation criteria, said William Ku, Brand Business Director for Vivotek. “Our chip verification process is quite long as these SoCs are crucial for compression. We have to test and verify their efficiency, signal streaming sizes and delays. Technical support from chip companies is also crucial as we have to crosscheck logical correctness and bugs. These tests take months.”
Greater than the Sum of All Parts
Sound quality depends a great deal on components. Microphone performance does not determine everything, but it is related to durability and interference. Compared with electret microphones, “MEMS microphones have higher tolerance and stability toward shifts in temperature, vibrations and electromagnetic compatibility (EMC),” said Thomas Hagh, VP of Products, Zenitel. “Today, many intercom manufacturers use electrets microphones. These microphones provide an analog audio signal to HW codec that performs the analog to microphone, on the other hand, gives users a digital audio directly sampled at 96 kHz, which is double CD quality.”
Another advantage of MEMS microphones is their tiny size. “The MEMS microphone membrane is only 0.1mm in diameter. This reduced size causes less vibration across the microphone's membrane, effectively reducing audio distortion, thus providing significantly better audio quality,” Hagh added.
All electrical systems create some noise. This noise is not a problem until it interferes with system performance. Engineers who design layouts with little regard for electromagnetic interference issues are finding that their designs are not performing to specification or are not working at all. Compared with the selection of chipsets, an intelligent design on the camera PCBs is even more vital in interference elimination, said Bjorn Weber, PM at Basler. “Intelligent design means a good arrangement of the components and a good selection of the right components. Placed in unfortunate positions these components can cause EMC problems and thus crackling noise at the other end.”
Audio codecs, like video ones, play an important role as they determine the level of compression and the quality of sound. Bandwidth optimization is obtained with the use of adequate audio and video codecs along with dynamic compression that can be adjusted based on the available bandwidth, Ravello said.
However, compared with video bandwidth, audio bandwidth is very low and bandwidth is rarely a problem, Weber added.
In terms of integrated video/audio solutions, “most important aspect for communications is to use open protocols to ensure interoperability with other systems.” That is why we are “committed to supporting SIP as well as other open IP protocols,” Hagh said.
It is important to use VoIP codecs that many systems support in order to reduce need for transcoding as it introduces delay and distortion. “If you send a voice packet between two systems, be sure the same voice codec is used in both systems. If you need to recode between G.722 to G.711, this will usually cause a delay of about 20-40 ms,” Hagh added. “All VoIP systems support G.711. However, G.711 is a narrowband codec (3.4 kHz) providing significantly lower quality than G.722. G.722 is the codec that all HD voice systems support today. By utilizing this open HD codec, the system will provide the full, end-to-end voice spectrum without any need for recoding and distortion. Another codec is AAC and we see a trend that AAC may become popular.”
It is important to note that not all video codecs support time-stamped video/audio synchronization, said Philip Siow, Senior Consultant, Axis Communications. “There are many situations where synchronized audio is less important or even undesirable; for example, if audio is to be monitored but not recorded. The time stamping of video packets using Motion JPEG compression may not always be supported in a network camera. If this is the case and if it is important to have synchronized video and audio, the video format to choose is MPEG-4 or H.264”
Product Adopted:Network Cameras