- ByteDance and Alibaba are planning to place orders for Huawei‘s new Ascend 950PR AI chip after testing confirmed improved CUDA compatibility and faster response times.
- The 950PR delivers 1 PFLOPS in FP8 and 2 PFLOPS in FP4, with 2.87x the compute performance of Nvidia‘s H20 — the most powerful chip Nvidia can legally sell in China.
- Huawei plans to ship 750,000 units in 2026, priced between $6,900 and $9,700 per card depending on memory configuration.
- ByteDance alone is reportedly planning to spend more than $5.6 billion on Huawei Ascend chips this year.
What Happened
Huawei’s latest AI chip, the Ascend 950PR, has won over two of China’s biggest technology companies. ByteDance and Alibaba are planning to place orders after customer testing confirmed that the chip is more compatible with Nvidia’s CUDA software ecosystem and offers better response speeds than its predecessors, according to Reuters sources familiar with the matter.
This marks a turning point for Huawei’s chip business. The Shenzhen-based company had previously struggled to persuade large private-sector technology firms to adopt its current flagship chip, the Ascend 910C, in significant quantities. The 910C suffered from software compatibility issues that made migration from Nvidia’s ecosystem difficult and time-consuming. The 950PR appears to have addressed the key complaints that kept major buyers on the sidelines, particularly around CUDA compatibility and response latency.
Why It Matters
U.S. export restrictions have blocked Nvidia from selling its most advanced AI chips to Chinese companies, leaving the H20 as the most powerful option legally available in China. The 950PR’s performance advantage over the H20 — nearly three times the compute — gives Chinese tech companies a domestic alternative that outperforms what they can legally import.
ByteDance’s reported plan to spend more than $5.6 billion (over 40 billion yuan) on Huawei Ascend chips in 2026 signals that China’s largest tech firms are ready to commit serious capital to domestic chip suppliers. This spending level would make ByteDance one of the largest single customers for any AI chip maker globally. For context, Meta spent approximately $37 billion on capital expenditures in 2025, a significant portion of which went to Nvidia GPU purchases.
Technical Details
The Ascend 950PR delivers 1 petaflop of performance in FP8 precision and 2 petaflops in FP4, along with 2 terabytes per second of interconnect bandwidth. It achieves 2.87 times the compute performance of Nvidia’s H20 chip.
The chip uses HiBL 1.0, Huawei’s proprietary memory technology designed to balance cost and performance for inference and recommendation workloads. Pricing is set at approximately 50,000 yuan ($6,900) per card with traditional DDR memory, while a premium version with faster HBM memory will sell for around 70,000 yuan ($9,700).
The critical improvement is CUDA migration compatibility. Through Huawei’s CANN Next software layer, the 950PR allows developers to port code written for Nvidia’s CUDA ecosystem with significantly less friction than previous Ascend chips required. Developers write code that resembles CUDA, but the execution is Ascend-optimized under the hood.
Who’s Affected
Chinese AI companies that have been constrained by U.S. export controls now have a viable high-performance domestic option. ByteDance, which operates TikTok’s recommendation systems and a growing suite of AI products, and Alibaba, which runs one of China’s largest cloud computing platforms, represent the highest-volume AI chip buyers in the country.
Nvidia faces the most direct impact. Even the limited H20 chips it could still sell into China now compete against a domestic alternative that benchmarks significantly higher. The 950PR’s CUDA-compatible software stack lowers the switching costs that had previously kept Chinese firms in Nvidia’s ecosystem. Nvidia’s China revenue has already declined since export restrictions tightened, and the 950PR threatens to accelerate that trend by offering both a performance advantage and local supply chain security.
What’s Next
Huawei is targeting 750,000 units shipped in 2026, with the bulk arriving in the second half of the year. Whether Huawei can hit that volume depends on manufacturing partner SMIC’s ability to produce advanced chips under its own set of U.S. sanctions constraints.
The key limitation is that CUDA compatibility is not the same as CUDA equivalence. Developers migrating large existing codebases may still encounter friction, and the long-term performance of the CANN Next software stack in production workloads has not been publicly benchmarked by independent third parties.