How I Built a CAD Reconstruction Engine with OpenCASCADE and GPT-4o Vision
Abhishek Sharma
Software Developer
How I Built a CAD Reconstruction Engine with OpenCASCADE and GPT-4o Vision
Every mechanical engineer I've spoken to has the same complaint: they spend hours manually recreating 3D models from legacy 2D drawings. Scanned blueprints from the 1980s, faded PDF exports from long-dead CAD systems, hand-drawn sketches from shop floors—all of them need to become manufacturing-grade 3D STEP files before any CNC machine can cut metal. With ForgeCadNeo, I set out to compress that 2-8 hour manual process into roughly 15 minutes of supervised automation.
This post is a deep technical walkthrough of how the system works: the AI-driven dimension extraction pipeline, the deterministic geometry engine, the validation layer that catches AI hallucinations, and the architectural decisions that make the whole thing production-ready.
The Architecture at 30,000 Feet
Before diving into code, here is the full pipeline visualized as a flow:
+------------------+ +-------------------+ +--------------------+
| Scanned PDF | | GPT-4o Vision | | Validation |
| / Image Upload |---->| Dimension |---->| Layer |
| | | Extraction | | (Rule Engine) |
+------------------+ +-------------------+ +--------------------+
|
v
+------------------+ +-------------------+ +--------------------+
| STEP File | | OpenCASCADE | | Feature Tree |
| Export |<----| B-Rep Kernel |<----| Builder |
| (.step/.stp) | | (pythonOCC) | | (Parametric) |
+------------------+ +-------------------+ +--------------------+
|
v
+--------------------+
| Three.js |
| Preview |
| (STL Mesh View) |
+--------------------+
The critical architectural insight is this: AI extracts parameters, but never generates geometry. GPT-4o Vision reads dimensions, tolerances, hole positions, and feature annotations from the drawing. But the actual 3D solid is built by OpenCASCADE's B-Rep kernel—a deterministic, mathematically precise engine that has been used in aerospace and automotive CAD for decades. This separation is not optional. It is the difference between a toy demo and a tool that produces files you can actually send to a machine shop.
Why B-Rep Over Mesh Geometry
This is the first question any 3D graphics developer asks, so let me address it upfront. There are two fundamental ways to represent 3D geometry:
Mesh (STL/OBJ): A cloud of triangles approximating a surface. Fast to render, universally supported for visualization, but fundamentally imprecise. A cylinder is not truly round—it is a polygon with enough faces to look round. You cannot extract exact dimensions from a mesh. CNC machines cannot reliably work from mesh files because they need exact mathematical surface definitions.
B-Rep (STEP/IGES): Boundary Representation. Surfaces are defined by exact mathematical equations—a cylinder is defined by its axis, radius, and height. Edges are the intersections of these surfaces. This is what CAD kernels like OpenCASCADE, Parasolid, and ACIS produce. Every CNC machine, every tolerance analysis tool, every FEA solver expects B-Rep data.
ForgeCadNeo generates B-Rep geometry internally and exports STEP files for manufacturing. For the browser preview, we tessellate the B-Rep into a mesh (STL) and render it with Three.js. The mesh is a throwaway visualization artifact; the STEP file is the deliverable.
GPT-4o Vision: Extracting Dimensions from Drawings
The most delicate part of the system is the vision extraction pipeline. Engineering drawings are dense with information: dimension lines, geometric dimensioning and tolerancing (GD&T) symbols, section views, detail views, notes, and title blocks. GPT-4o Vision needs to parse all of this and return structured data.
Here is the core extraction endpoint:
# api/routes/extraction.py
from fastapi import APIRouter, UploadFile, File, Depends
from openai import AsyncOpenAI
import json
from pydantic import BaseModel
from typing import Optional
router = APIRouter(prefix="/api/extraction", tags=["extraction"])
class ExtractedDimension(BaseModel):
feature_type: str # "hole", "slot", "pocket", "boss", "chamfer", "fillet"
parameters: dict # varies by feature type
position: Optional[dict] # x, y, z offsets from datum
tolerance: Optional[dict] # upper/lower deviation
confidence: float # 0.0 - 1.0
class ExtractionResult(BaseModel):
stock_dimensions: dict # overall bounding box: length, width, height
features: list[ExtractedDimension]
material: Optional[str]
units: str # "mm" or "inch"
view_count: int # how many orthographic views detected
warnings: list[str]
@router.post("/analyze", response_model=ExtractionResult)
async def analyze_drawing(
file: UploadFile = File(...),
client: AsyncOpenAI = Depends(get_openai_client),
):
image_data = await file.read()
base64_image = base64.b64encode(image_data).decode("utf-8")
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": EXTRACTION_SYSTEM_PROMPT,
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract all dimensions, features, and manufacturing data from this engineering drawing. Return structured JSON.",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}",
"detail": "high",
},
},
],
},
],
response_format={"type": "json_object"},
temperature=0.1, # low temperature for factual extraction
max_tokens=4096,
)
raw_result = json.loads(response.choices[0].message.content)
validated = validate_and_clean(raw_result)
return validated
Two things to note here. First, temperature=0.1 is critical. We are not asking the model to be creative; we are asking it to read numbers off a drawing. Higher temperature introduces variance in numeric extraction that directly translates to dimensional errors in the final part. Second, "detail": "high" forces GPT-4o to process the image at full resolution. Engineering drawings have small text (tolerances are often 6pt font on a scanned A3 sheet), and the default resolution misses them.
The Extraction Prompt: Where the Magic Happens
Prompt engineering for dimension extraction is nothing like writing chatbot prompts. The system prompt is around 2,000 tokens and reads more like a specification document:
EXTRACTION_SYSTEM_PROMPT = """You are a mechanical engineering dimension extraction system.
You analyze 2D engineering drawings and extract structured manufacturing data.
RULES:
1. Only extract dimensions that are explicitly annotated with dimension lines.
NEVER estimate or infer dimensions that are not labeled.
2. When a dimension is ambiguous, set confidence below 0.5 and add a warning.
3. All angular dimensions must be in degrees.
4. Identify the drawing's unit system from the title block. Default to mm if unclear.
5. For each feature, extract:
- Feature type (hole, slot, pocket, boss, chamfer, fillet, counterbore, countersink)
- All defining parameters (diameter, depth, width, length, angle, radius)
- Position relative to a datum or edge (look for coordinate dimensions)
- Tolerance if GD&T or +/- annotation is present
6. For holes: distinguish between through-holes (depth = "THRU") and blind holes.
7. For patterns: identify linear and circular patterns, extract count and spacing.
8. Stock dimensions: extract the overall bounding box from the main views.
OUTPUT FORMAT: Return a JSON object with this exact schema:
{
"units": "mm" | "inch",
"stock_dimensions": {"length": number, "width": number, "height": number},
"material": string | null,
"features": [
{
"feature_type": string,
"parameters": {...},
"position": {"x": number, "y": number, "z": number} | null,
"tolerance": {"upper": number, "lower": number} | null,
"confidence": number
}
],
"view_count": number,
"warnings": [string]
}
CONFIDENCE SCORING:
- 1.0: Dimension is clearly readable with explicit annotation
- 0.7-0.9: Dimension is readable but partially obscured or has scan artifacts
- 0.5-0.7: Dimension is ambiguous, could be misread
- Below 0.5: Dimension is guessed or inferred (ALWAYS add a warning)
"""
The confidence scoring system is not cosmetic. It feeds directly into the validation layer that decides whether to proceed or ask the user for manual confirmation.
The Validation Layer: Catching AI Hallucinations
AI hallucinations in a chatbot are a minor annoyance. AI hallucinations in a CAD system produce parts that do not fit, waste material, or break tooling. The validation layer is a rule engine that applies physical constraints:
# core/validation.py
from dataclasses import dataclass
@dataclass
class ValidationResult:
is_valid: bool
errors: list[str]
warnings: list[str]
auto_corrections: list[str]
def validate_extraction(result: ExtractionResult) -> ValidationResult:
errors, warnings, corrections = [], [], []
stock = result.stock_dimensions
# Rule 1: No feature can exceed stock dimensions
for i, feat in enumerate(result.features):
if feat.feature_type == "hole":
diameter = feat.parameters.get("diameter", 0)
depth = feat.parameters.get("depth", 0)
if diameter > min(stock["length"], stock["width"]):
errors.append(
f"Feature {i}: Hole diameter ({diameter}mm) exceeds "
f"stock dimensions. Likely OCR misread."
)
if isinstance(depth, (int, float)) and depth > stock["height"]:
warnings.append(
f"Feature {i}: Hole depth ({depth}mm) exceeds stock height "
f"({stock['height']}mm). Treating as through-hole."
)
feat.parameters["depth"] = "THRU"
corrections.append(f"Feature {i}: Auto-corrected to through-hole.")
# Rule 2: No two features can occupy the same space
positions = []
for i, feat in enumerate(result.features):
if feat.position:
pos = (feat.position["x"], feat.position["y"])
for j, prev_pos in positions:
dist = ((pos[0]-prev_pos[0])**2 + (pos[1]-prev_pos[1])**2)**0.5
min_clearance = _get_min_clearance(feat, result.features[j])
if dist < min_clearance:
warnings.append(
f"Features {j} and {i} may overlap (distance: "
f"{dist:.2f}mm, min clearance: {min_clearance:.2f}mm)."
)
positions.append((i, pos))
# Rule 3: Confidence threshold gating
low_confidence = [
(i, f) for i, f in enumerate(result.features) if f.confidence < 0.6
]
if low_confidence:
for i, feat in low_confidence:
warnings.append(
f"Feature {i} ({feat.feature_type}): Low confidence "
f"({feat.confidence:.1%}). Manual review recommended."
)
# Rule 4: Standard dimension sanity checks
if stock["length"] <= 0 or stock["width"] <= 0 or stock["height"] <= 0:
errors.append("Stock dimensions must be positive non-zero values.")
return ValidationResult(
is_valid=len(errors) == 0,
errors=errors,
warnings=warnings,
auto_corrections=corrections,
)
In production, about 8% of extractions trigger validation warnings, and roughly 2% produce hard errors that require user intervention. The most common hallucination is decimal point misplacement—GPT-4o reads "12.5mm" as "125mm" on low-resolution scans. The stock dimension boundary check catches this almost every time.
OpenCASCADE Feature Tree Execution
Once extraction is validated, the feature tree builder converts the structured data into a sequence of B-Rep operations using pythonOCC (the Python binding for OpenCASCADE):
# core/geometry_engine.py
from OCP.BRepPrimAPI import (
BRepPrimAPI_MakeBox,
BRepPrimAPI_MakeCylinder,
)
from OCP.BRepAlgoAPI import BRepAlgoAPI_Cut, BRepAlgoAPI_Fuse
from OCP.BRepFilletAPI import BRepFilletAPI_MakeFillet, BRepFilletAPI_MakeChamfer
from OCP.gp import gp_Pnt, gp_Ax2, gp_Dir
from OCP.STEPControl import STEPControl_Writer, STEPControl_AsIs
from OCP.TopExp import TopExp_Explorer
from OCP.TopAbs import TopAbs_EDGE
def build_solid(extraction: ExtractionResult) -> "TopoDS_Shape":
stock = extraction.stock_dimensions
# Step 1: Create stock block
solid = BRepPrimAPI_MakeBox(
stock["length"], stock["width"], stock["height"]
).Shape()
# Step 2: Apply each feature as a boolean operation
for feature in extraction.features:
if feature.feature_type == "hole":
solid = _apply_hole(solid, feature, stock)
elif feature.feature_type == "pocket":
solid = _apply_pocket(solid, feature, stock)
elif feature.feature_type == "fillet":
solid = _apply_fillet(solid, feature)
elif feature.feature_type == "chamfer":
solid = _apply_chamfer(solid, feature)
elif feature.feature_type == "boss":
solid = _apply_boss(solid, feature)
return solid
def _apply_hole(solid, feature, stock):
params = feature.parameters
pos = feature.position or {"x": 0, "y": 0, "z": stock["height"]}
# Create cylinder for the hole
axis = gp_Ax2(
gp_Pnt(pos["x"], pos["y"], stock["height"]),
gp_Dir(0, 0, -1), # drill direction: top-down
)
radius = params["diameter"] / 2.0
depth = stock["height"] if params.get("depth") == "THRU" else params["depth"]
cylinder = BRepPrimAPI_MakeCylinder(axis, radius, depth).Shape()
# Boolean subtract: solid minus cylinder
result = BRepAlgoAPI_Cut(solid, cylinder)
if not result.IsDone():
raise GeometryError(f"Boolean cut failed for hole at ({pos['x']}, {pos['y']})")
return result.Shape()
def export_step(shape, filepath: str) -> str:
writer = STEPControl_Writer()
writer.Transfer(shape, STEPControl_AsIs)
status = writer.Write(filepath)
if status != 1: # IFSelect_RetDone
raise ExportError(f"STEP export failed with status {status}")
return filepath
The feature tree is intentionally ordered. Stock creation comes first, then subtractive features (holes, pockets, slots), then additive features (bosses), then edge treatments (fillets, chamfers). This matches how a machinist thinks about the part and produces cleaner B-Rep topology than applying operations in arbitrary order.
The FastAPI Geometry Endpoint
The geometry generation endpoint ties the extraction and kernel together, handling the full pipeline in a single request or as a two-step process for the interactive preview workflow:
# api/routes/geometry.py
from fastapi import APIRouter, HTTPException, BackgroundTasks
from fastapi.responses import FileResponse
import tempfile, os, uuid
router = APIRouter(prefix="/api/geometry", tags=["geometry"])
@router.post("/generate")
async def generate_geometry(
extraction: ExtractionResult,
background_tasks: BackgroundTasks,
tenant_id: str = Depends(get_tenant),
):
# Check credits (Stripe-based billing)
credits = await check_credits(tenant_id)
if credits < 1:
raise HTTPException(402, "Insufficient credits. Please purchase more.")
# Validate extraction data
validation = validate_extraction(extraction)
if not validation.is_valid:
raise HTTPException(
422,
detail={
"message": "Extraction validation failed",
"errors": validation.errors,
"warnings": validation.warnings,
},
)
# Build geometry
try:
shape = build_solid(extraction)
except GeometryError as e:
raise HTTPException(500, f"Geometry generation failed: {e}")
# Export STEP file
job_id = str(uuid.uuid4())
step_path = f"/tmp/forgecad/{tenant_id}/{job_id}.step"
stl_path = f"/tmp/forgecad/{tenant_id}/{job_id}.stl"
os.makedirs(os.path.dirname(step_path), exist_ok=True)
export_step(shape, step_path)
tessellate_to_stl(shape, stl_path) # for Three.js preview
# Deduct credit in background
background_tasks.add_task(deduct_credit, tenant_id, job_id)
return {
"job_id": job_id,
"step_url": f"/api/geometry/download/{job_id}?format=step",
"stl_url": f"/api/geometry/download/{job_id}?format=stl",
"warnings": validation.warnings,
"auto_corrections": validation.auto_corrections,
}
Three.js Preview Rendering
The browser preview loads the STL mesh (tessellated from the B-Rep) using Three.js with React Three Fiber. This gives users immediate visual feedback before they download the STEP file:
// components/CadPreview.tsx
import { Canvas } from '@react-three/fiber';
import { OrbitControls, Stage, Grid } from '@react-three/drei';
import { STLLoader } from 'three/examples/jsm/loaders/STLLoader';
import { useLoader } from '@react-three/fiber';
import * as THREE from 'three';
interface CadPreviewProps {
stlUrl: string;
warnings?: string[];
}
function StlModel({ url }: { url: string }) {
const geometry = useLoader(STLLoader, url);
// Center the geometry and compute normals for proper shading
geometry.center();
geometry.computeVertexNormals();
return (
<mesh geometry={geometry} castShadow receiveShadow>
<meshStandardMaterial
color="#b8860b"
metalness={0.6}
roughness={0.35}
envMapIntensity={1.2}
/>
</mesh>
);
}
export function CadPreview({ stlUrl, warnings }: CadPreviewProps) {
return (
<div className="relative w-full h-[600px] bg-neutral-950 rounded-lg">
<Canvas
shadows
camera={{ position: [150, 100, 150], fov: 45 }}
gl={{ antialias: true, toneMapping: THREE.ACESFilmicToneMapping }}
>
<Stage environment="city" intensity={0.5}>
<StlModel url={stlUrl} />
</Stage>
<OrbitControls
enablePan={true}
enableZoom={true}
minDistance={50}
maxDistance={500}
/>
<Grid
infiniteGrid
fadeDistance={400}
cellSize={10}
cellColor="#333"
sectionColor="#555"
/>
</Canvas>
{warnings && warnings.length > 0 && (
<div className="absolute top-4 right-4 bg-yellow-900/80 p-3 rounded max-w-xs">
<p className="text-yellow-200 text-sm font-semibold mb-1">Warnings</p>
{warnings.map((w, i) => (
<p key={i} className="text-yellow-100 text-xs">{w}</p>
))}
</div>
)}
</div>
);
}
Why Not Let AI Generate Geometry Directly?
This is the question I get most often. "Why not just ask GPT-4o to output the OpenCASCADE code directly?" I tried it. Here is why it fails:
1. Topological Consistency. B-Rep geometry must be topologically valid—every edge must be shared by exactly two faces, every face must form a closed loop, and the solid must be watertight. LLMs generate code that looks plausible but produces invalid topology about 40% of the time. An invalid B-Rep cannot be exported to STEP, cannot be sliced for manufacturing, and cannot be used for simulation.
2. Numerical Precision. OpenCASCADE operates at tolerances of 1e-7mm. LLM-generated code routinely introduces floating-point errors through unnecessary intermediate calculations. A deterministic feature tree with validated input parameters does not have this problem.
3. Reproducibility. The same drawing, processed twice, must produce the same STEP file byte-for-byte. LLM-generated geometry varies between runs. For manufacturing traceability (ISO 9001 compliance), this is a non-starter.
4. Debugging. When a part fails quality inspection at the machine shop, I need to trace the error back to either a misread dimension or a geometry bug. With a deterministic feature tree, I can inspect each operation in sequence. With LLM-generated code, the debugging surface is the entire program.
The correct architecture is: AI for perception (reading drawings), deterministic kernel for geometry (building solids). This is the same principle behind self-driving cars—neural networks handle perception, but trajectory planning and control are deterministic systems.
Multi-Tenant Architecture and Credits
ForgeCadNeo is a multi-tenant SaaS with a credits-based billing system through Stripe. Each geometry generation costs one credit. The tenant isolation is straightforward: every API route receives a tenant_id from the JWT, and all database queries and file storage are scoped to that tenant. File storage uses the pattern /storage/{tenant_id}/{job_id}/ to ensure complete isolation.
Credits are tracked in a PostgreSQL table with optimistic locking to prevent race conditions on concurrent requests:
UPDATE tenant_credits
SET balance = balance - 1,
last_used_at = NOW()
WHERE tenant_id = $1
AND balance > 0
RETURNING balance;
If the RETURNING clause returns no rows, the balance was already zero and the request is rejected. This is an atomic operation—no locks, no race conditions, and it handles concurrent requests correctly even under load.
Performance and Results
In production, the pipeline processes a typical bracket or flange drawing (3-8 features) in under 90 seconds end-to-end:
- Image upload and preprocessing: ~2 seconds
- GPT-4o Vision extraction: ~15-25 seconds (depends on image complexity)
- Validation: <100ms
- OpenCASCADE geometry generation: ~3-8 seconds
- STEP + STL export: ~2-4 seconds
- Three.js preview load: ~1-2 seconds (client-side)
Compared to manual CAD reconstruction, the numbers speak for themselves. A senior CAD engineer doing this manually averages 2-8 hours per drawing depending on complexity. ForgeCadNeo handles the same drawings in approximately 15 minutes of total user time, including the review and correction step.
Lessons Learned
1. Treat AI output as untrusted user input. Every dimension that comes from GPT-4o goes through the same validation you would apply to user-submitted form data. Sanitize, bound-check, cross-reference.
2. Invest in the validation layer early. I spent more engineering time on validation rules than on the extraction prompt. The prompt gets you 90% accuracy. The validation layer gets you from 90% to 98%. The remaining 2% is the manual review step, and that is acceptable for this domain.
3. OpenCASCADE documentation is sparse. The pythonOCC community is small but helpful. I relied heavily on reading the C++ source code and translating. Budget extra time for the kernel integration.
4. Three.js is fine for preview, not for CAD. Users initially asked for measurement tools and section views in the browser preview. That path leads to building a full web CAD viewer, which is a separate product. Keep the preview simple: orbit, zoom, visual confirmation. Detailed inspection happens in the user's own CAD software after downloading the STEP file.
ForgeCadNeo demonstrates a pattern I expect to see more of in industrial AI applications: neural networks for perception, deterministic systems for precision. The AI does not need to be perfect. It needs to be good enough that, combined with a robust validation layer, the output is reliable. That is an engineering problem, not a machine learning problem, and that is exactly why it works.