Foreman AI Research

Benchmarks & Publications

Empirical evaluations and technical research from the Foreman AI team. We measure what matters: recall fidelity, extraction accuracy, and real-world construction plan intelligence.

Benchmark December 2024

Foreman AI Benchmark 001

Empirical Comparison of General-Purpose AI vs Domain-Tuned Plan Intelligence

This benchmark compares a state-of-the-art general-purpose LLM against Foreman AI when applied to real-world construction plan sets. Results reveal a consistent performance gap favoring domain-tuned intelligence.

Read benchmark →

Benchmark July 2026

Foreman AI vs. Perplexity Computer

Technical benchmark report — 136-Page Aspen Estate Plan Intelligence

A controlled evaluation on a complex, multi-wing 136-page architectural plan set across tiny-text evidence, geometry & spatial reasoning, and builder-risk coordination. Efficiency-adjusted result: Foreman AI 109.75 / 115, Perplexity Computer 105.75 / 115.

Read benchmark →

Publication December 2025

A Large-Scale, Vector-Native Dataset for Construction Document Intelligence

Nearly one million pages of real construction plans for AI training and evaluation

This publication introduces a dataset of 7,550 construction plan sets totaling 954,892 pages—the first large-scale, vector-native corpus enabling meaningful model training and benchmarking on professional construction documents.

Read publication →

Stay informed

Get notified when new benchmarks are published.

We release benchmarks and technical publications as we validate new capabilities. Join the list to receive updates.

Subscribe to updates Try ForemanAI Blueprints