Empirical Comparison of General-Purpose AI vs Domain-Tuned Plan Intelligence
This benchmark compares a state-of-the-art general-purpose LLM against Foreman AI when applied to real-world construction plan sets. Results reveal a consistent performance gap favoring domain-tuned intelligence.
Benchmark 002: Schedule Extraction Accuracy
Measuring schedule detection and data extraction across MEP plan sets
An upcoming benchmark evaluating schedule identification, column mapping, and cross-reference accuracy across equipment, door, and window schedules.
Nearly one million pages of real construction plans for AI training and evaluation
This publication introduces a dataset of 7,550 construction plan sets totaling 954,892 pages—the first large-scale, vector-native corpus enabling meaningful model training and benchmarking on professional construction documents.