CADSmith: Multi-Agent CAD Generation Insights

CADSmith is a multi-agent pipeline that generates CadQuery CAD code from natural language descriptions, using two nested correction loops for iterative refinement.
Against a zero-shot baseline, CADSmith achieves 100% execution reliability (up from 95%), improves median IoU from 0.8085 to 0.9629, and reduces mean Chamfer Distance from 28.37 to 0.74.
The system combines exact geometric measurements from the OpenCASCADE kernel with visual assessment from a vision-language model judge.
It uses retrieval-augmented generation over API documentation rather than fine-tuning, allowing it to stay current as the CadQuery library evolves.

What Happened

Researchers Jesse Barkley, Rumi Loghmani, and Amir Barati Farimani have published CADSmith, a multi-agent system that generates CAD models from text descriptions. The system produces CadQuery code and then refines it through two nested feedback loops until the geometry matches the specification.

Existing text-to-CAD methods either generate models in a single pass without verifying geometry or rely on visual feedback alone, which cannot catch dimensional errors. CADSmith closes this gap by combining programmatic geometric validation with visual assessment. The paper was submitted to arXiv on March 27, 2026.

Why It Matters

Generating accurate 3D CAD models from natural language is a long-sought capability for manufacturing, product design, and engineering workflows. Current methods struggle with a fundamental tension: LLMs can produce plausible-looking code, but without geometric verification, the resulting shapes often have incorrect dimensions, missing features, or invalid topology.

CADSmith’s approach of grounding corrections in exact measurements rather than visual appearance alone addresses this directly. The mean Chamfer Distance — a standard measure of how closely two 3D shapes match — dropped from 28.37 to 0.74, a reduction of over 97%. This level of geometric accuracy brings LLM-generated CAD closer to being usable in actual engineering contexts where dimensional precision is non-negotiable.

The execution reliability improvement from 95% to 100% is also significant. In a production environment, a 5% failure rate means one in twenty generated models crashes entirely, requiring manual intervention. Eliminating this failure mode makes the system more practical for batch processing of design specifications.

The retrieval-augmented generation (RAG) approach is also practical. Rather than fine-tuning a model on CadQuery’s API, the system retrieves relevant documentation at runtime. This means the system can adapt as the CadQuery library releases updates without requiring retraining.

Technical Details

CADSmith operates as a multi-agent pipeline with two nested correction loops. The inner loop resolves code execution errors — cases where the generated CadQuery code fails to run. The outer loop handles geometric validation, checking whether the code produces the correct shape.

The outer loop combines two types of feedback. First, it uses the OpenCASCADE geometric kernel to compute exact measurements: bounding box dimensions, volume, and solid validity. Second, an independent vision-language model serves as a judge, providing “holistic visual assessment” of the rendered shape. The authors describe this dual approach as providing “both the numerical precision and the high-level shape awareness needed to converge on the correct geometry.”

The system was evaluated on a custom benchmark of 100 prompts organized into three difficulty tiers (T1 through T3). In three ablation configurations, the full pipeline achieved a 100% execution rate compared to 95% for the zero-shot baseline. The median F1 score improved from 0.9707 to 0.9846, the median Intersection over Union (IoU) rose from 0.8085 to 0.9629, and the mean Chamfer Distance fell from 28.37 to 0.74.

Who’s Affected

Mechanical engineers and product designers who want to accelerate early-stage design iteration are the primary beneficiaries. The ability to describe a part in plain language and receive geometrically validated CAD code could reduce the time spent on initial modeling.

Researchers working on code generation and 3D content creation will also find the nested correction loop architecture instructive. The combination of symbolic validation (exact measurements) with neural assessment (vision-language judging) represents a pattern that could apply to other domains where generated code must meet precise specifications.

What’s Next

The benchmark uses 100 custom prompts across three difficulty tiers, but the system’s performance on more complex real-world CAD tasks — assemblies with multiple interacting parts, tolerancing requirements, or manufacturing constraints — has not been tested. The reliance on CadQuery also limits the output to that specific library’s capabilities, which may not cover all CAD operations needed in production engineering workflows.

Integration with industry-standard CAD formats such as STEP and IGES for downstream manufacturing is another area that will need attention before the system can fit into existing engineering pipelines.

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

ChatGPT’s Web Traffic Share Falls 78%→54% in 12 Months as Gemini Triples Reach

Cisco Soars on Forecast Boost After AI-Focused Layoffs, Bloomberg Reports

Apple-OpenAI Partnership Frays, Bloomberg Reports Possible Legal Fight Ahead