https://security.gallagher.com/en-HK/OneLink

INSIGHTS

Dahua unlocks the future of video security with Xinghan Large-Scale AI Models

A leader in intelligent IoT centered around video, Dahua has always been on the forefront of AI development. The company has recently launched an upgrade of their Xinghan Large-Scale AI Models, which combine multimodal capabilities and industry knowledge to make video surveillance smarter.

Dahua unlocks the future of video security with Xinghan Large-Scale AI Models

Date: 2025/09/19
Source: William Pao

A leader in the field of intelligent IoT centered around video, Dahua has always been on the forefront of AI research and development. The company has recently launched an upgraded version of their Xinghan Large-Scale AI Models, which combine multimodal capabilities and industry knowledge to make video surveillance more intelligent than ever. This article takes a closer look at Xinghan and how it helps users achieve smarter security.

Addressing conventional AI challenges

Xinghan aims to address certain conventional CNN AI challenges – these include difficulty detecting small targets at long distances; false alarms caused by interferences such as birds and leaves; and long customization cycle for creating new algorithms.

“During the industry’s digital and intelligent transformation, AI technology still faces challenges. While algorithm accuracy has reached high levels in some areas, demands for adaptive intelligence across complex, dynamic scenarios and higher accuracy continue to rise. Simultaneously, business needs are evolving from perception and simple cognition to complex cognition. Additionally, complex rule configuration and cumbersome interactions in practical applications hinder usability. With advancements in large model technology, Dahua launched the Xinghan Large-Scale AI Models to address these issues,” said Frank Fang, Overseas Product Director at Dahua, adding that Xinghan aims to solve real user pain points with the following five key differentiators:

From accuracy to precision: Enhancing detection in extreme conditions (for example tiny targets, blurry images and strong backlighting), ensuring stable and reliable recognition;

From customization to generalization: Greatly shortening the development cycle for custom algorithms and reducing complex steps;

From recognition to comprehension: Supporting not only routine behavior recognition but also understanding complex multi-target interactions;

From static to dynamic: Overcoming limitations of static rule configurations to enable autonomous scene parsing and dynamic adaptation;

Enhanced language and multimodal capabilities: Simplifying operations via natural language interaction; processing text, images and video to enable understanding and interaction with the world.

Different models

Debuted in 2023, Xinghan continues to evolve by combining multimodal intelligence and deep domain expertise. This development has led to three core series under Xinghan: Xinghan Vision Models, (vision-centric intelligence), Xinghan Multimodal Models (multimodal-fusion capabilities), and Xinghan Language Models (language-driven interaction). This article examines the Vision and Multimodal Models more closely.

Xinghan Vision Models

The Xinghan Vision Models are featured in certain camera models under Dahua’s IPC and PTZ series. Since large models typically reside on servers, deploying Xinghan on edge devices requires minimizing model size and advanced training, which can be likened to a person’s education.

“First, we enable the algorithm to undergo unsupervised training using hundreds of millions unlabeled data, resulting in a massive pre-trained model that is extensive and diverse, broad but not precise – somewhat like our primary and secondary school curricula that cover all foundational subjects without delving deeply into any specific field,” said Xiangming Zhou, R&D Expert at Dahua.

He adds: “To address our specific business needs, we then employ supervised training with labeled task-specific data to develop our expert task model. This labeled training phase can be likened to university education – students focus on their majors, continuously refining professional knowledge while gradually forgetting many secondary school subjects irrelevant to their specialization. To meet camera deployment requirements, we further perform knowledge distillation, fine-tuning, and quantization on the expert task model, significantly reducing its parameter count. This ultimately yields an edge-side large model precisely tailored for specific business objectives and products.”

The Xinghan Vision Models make video analysis more accurate and intelligent, enabling various applications. One of them is Perimeter Protection, where the detection distance is increased by 50 percent, the detection accuracy still reaches 98 percent, and the false alarm rate is reduced by 92 percent. Based on the Xinghan Large-Scale AI Models capability, Perimeter Protection innovatively releases the AI Rule Assist function, which can automatically analyze the scene and automatically generate regional intrusion rule lines. It is easy to operate and improves efficiency. Perimeter Protection also supports more than 10 animal detections, bringing more value to users.

Other applications include WizTracking, where the same person can be effectively tracked even when the person’s posture changes or when the person is being blocked by obstacles; and Crowd Map, which supports the detection of small targets at long distances and up to 5,000 persons in large-scale scenarios. Last but not least, AI WDR, which leverages the capabilities of the Xinghan Large-Scale AI Models, automatically identifies the scene in the image and determines whether to activate or deactivate WDR based on changes in the picture. This eliminates the need for manual adjustments, ensuring a clear image while reducing the user's operational burden.

Xinghan Multimodal Models

Compared to unimodal models, which are confined to processing a single data type (for example text-only or image-only), the Xinghan Multimodal Models are artificial intelligence systems capable of processing multiple heterogeneous data types (such as text, images, and video) in parallel and integrating them deeply, empowering diverse applications such as WizSeek and text-defined alarms.

Leveraging the power of Dahua Xinghan Multimodal model technology, WizSeek transforms video retrieval. It aims to solve video retrieval pain points such as lack of support for multi-condition retrieval and over-reliance on target-events presetting. Suppose the user wants to look for a man making a phone call near a car. With conventional metadata search, the user can only select attributes one by one, and behaviors such as "calling" can't be retrieved. With WizSeek, the user just needs to text “A man making phone call near a car” and locate the footage in a matter of seconds. WizSeek revolutionizes the video search experience, delivering unparalleled speed, precision, and efficiency when navigating vast amounts of video clips, while offering an exceptionally intuitive and streamlined user journey

Text-defined alarms, meanwhile, build custom arming via the use of text description. New algorithms can be developed through prompt text, greatly reducing the development threshold. In conventional AI, for example, the user creating an algorithm “human pushing a stroller” requires the following to take place: material collection, data annotation, development on device and algorithm training – a process that takes about a month. With text-defined alarms, powered by the multimodal models capabilities, the user only needs to type the text “human pushing a stroller,” and a model will be created and deployed in seconds. After creating a new algorithm for "Text-defined Alarms" in recorders (IVSS), the user can directly perform local training within the same device to optimize algorithm performance, saving significant time and labor costs, and the optimized algorithm can truly help “Text-defined Alarms‌” realize “More Use, More Accuracy.” The Xinghan Multimodal Models are featured in Dahua Products including NVR, IVSS and IVD.

A leader in AI technologies

In closing, it is said that video surveillance has evolved from seeing a scene to understanding a scene. Dahua has clearly caught up with this trend with Xinghan, which understands complex multi-target interactions, reduces false alarms and shortens deployment cycles, in the process helping users gain more security and business intelligence. This companion piece to our article further explores the user-facing innovations that come with Xinghan.

With Xinghan, Dahua shows to the world what next-generation AI can do, and again proves itself as a leader in advanced AI technologies.

https://www.supremainc.com/en/platform/unified-security-platform-biostar-x.asp?utm_source=banner_ad&utm_medium=asmag.com&utm_campaign=biostar_x_launch

Dahua expands WizColor lineup with sharper images, easier installs, and fewer false alarms

Dahua’s next-gen wireless security offers smart deterrence and 24/7 low-power recording

Dahua introduces Dahua DoLynk Cloud: A seamless, integrated solution for smart and secure connectivity