Introducing Vidu: China's Breakthrough in AI Video Generation Challenging Sora's Dominance

Written by
Taiwo Oluwole

In the realm of AI innovation, a new contender has emerged. China's answer to the AI video generation challenge, Vidu, is making waves in the tech community. Developed by a collaborative effort between Tsinghua University and Beijing-based company Shengshu Technology, Vidu is set to redefine the future of video creation with its "long duration, high consistency, and high dynamics" features.

The Birth of a New AI Video Giant: Vidu

Recently, at the 2024 Zhongguancun Forum's Future Artificial Intelligence Pioneer Forum, Vidu was unveiled to the world. With a single generation capability of up to 16 seconds, Vidu is not just a fleeting moment in AI video generation—it's a milestone that directly competes with the global sensation, Sora.

Longer, More Complex Videos:

While most domestic video generation models in China have been capped at roughly 4-second outputs, Vidu has shattered this barrier. It introduces complex dynamic shots, transcending the limitations of simple camera movements to offer a seamless transition between long shots, close-ups, and intricate camera work, including pull focuses and smooth transitions.

Simulating Reality and Beyond:

Vidu's prowess doesn't stop at mimicking reality. It also dabbles in the fantastical, creating scenes that defy the physical world while maintaining a startling level of authenticity. From lifelike interactions to nuanced character expressions, Vidu's simulations are as impressive as they are versatile.

Understanding Chinese Elements:

As a homegrown model, Vidu boasts a unique edge—its deep comprehension of Chinese cultural elements. It skillfully weaves in symbols like pandas and dragons, elements that are not just aesthetically pleasing but also culturally significant, offering a depth of storytelling that resonates with its audience.

As for the understanding of dragon, Vidu and Sora are different. Vidu shows the image of virtual dragon in reality, while Sora is the real scene of dragon and lion dance in reality. However, the two also show their own characteristics in various details of dragon image.

U-ViT Architecture: A Step Ahead of DiT

Vidu's rapid ascent to prominence is credited to the team's long-standing expertise in Bayesian machine learning and multimodal large models. Its core technology, the U-ViT architecture, was proposed before Sora's DiT architecture, marking it as a pioneer in integrating Diffusion models with Transformer models.

Shaping the Future of Video Generation:

Vidu is more than a one-trick pony; it's a versatile visual model capable of supporting a wide array of video content types and lengths. Its scalable architecture hints at future expansions into broader modalities, promising to push the boundaries of multimodal capabilities.

Who is Shengshu Technology?

Despite being a relative newcomer, Shengshu Technology has made significant strides in AI. With a team hailing from Tsinghua University and seasoned professionals from tech giants like Alibaba, Tencent, and ByteDance, Shengshu Technology has already made its mark with substantial investments from prominent financial backers.

Commercializing AI:

Shengshu Technology is not just about theoretical advancements; they are actively commercializing their models. Offering API access to B2B clients and developing niche application products, Shengshu Technology is paving the way for AI video generation to become a staple in creative industries.


As Vidu steps into the global stage, it brings with it a new perspective on AI video generation. With its advanced capabilities and cultural insights, Vidu is poised to challenge the status quo and lead the charge in the next generation of AI-driven video creation.