Foreman AI icon Foreman AI

Foreman AI Research

Benchmarks & Publications

Empirical evaluations and technical research from the Foreman AI team. We measure what matters: recall fidelity, extraction accuracy, and real-world construction plan intelligence.

Benchmark December 2024

Foreman AI Benchmark 001

Empirical Comparison of General-Purpose AI vs Domain-Tuned Plan Intelligence

This benchmark compares a state-of-the-art general-purpose LLM against Foreman AI when applied to real-world construction plan sets. Results reveal a consistent performance gap favoring domain-tuned intelligence.

Benchmark Coming Soon

Benchmark 002: Schedule Extraction Accuracy

Measuring schedule detection and data extraction across MEP plan sets

An upcoming benchmark evaluating schedule identification, column mapping, and cross-reference accuracy across equipment, door, and window schedules.

Get notified when new benchmarks are published.

We release benchmarks and technical publications as we validate new capabilities. Join the list to receive updates.