Nvidia’s Nemotron Cascade 2 30B Model Achieves 97.6% on HumanEval Benchmark

A Reddit user has highlighted strong performance results for Nvidia’s Nemotron Cascade 2 30B-A3B model, which achieved 97.6% on the HumanEval coding benchmark and 88% on ClassEval. The results were posted by user ilintar on the LocalLLaMA subreddit, who tested mradermacher’s IQ4_XS quantized version of the model.

According to the post, the Nemotron Cascade 2 30B-A3B “is *not* based on the Qwen architecture despite a similar size, it’s a properly hybrid model based on Nemotron’s own arch.” The user noted that despite discussions around Nvidia’s Nemotron Super family of models, this particular model “has largely flown under the radar.”

The evaluation used HumanEval and ClassEval benchmarks, which the tester described as “quick to run and complicated enough for most small models to still have noticeable differences.” On HumanEval, the model’s 97.6% score reportedly left “both medium Qwen3.5 models in the rear window,” though specific comparison scores were not provided.

The Reddit user indicated they moved away from subjective evaluation methods, stating: “I’ve been running some evals on local models lately since I’m kind of tired of the ‘vibe feels’ method of judging them.” The combination of HumanEval and ClassEval was chosen as the testing methodology for its balance of speed and complexity.

The poster indicated plans for additional testing, writing “I’m going to run some more tests on this model, but I feel it deserves a bit more attention.” No timeline was provided for when additional benchmark results might be available.

Nvidia’s Nemotron Cascade 2 30B Model Achieves 97.6% on HumanEval Benchmark

Enjoyed this story?

Bluesky Launches AI Assistant Attie for Custom Social Media Feed Creation

Chroma Releases Context-1: 20B Parameter Model for Multi-Hop Search

Developer Releases Miasma Tool to Trap AI Web Scrapers with Poisoned Data

Before you go…