The questions we get asked first.
How accurate is AI document extraction in practice?
Depends on the document, the model, and the eval methodology — vendor marketing claims of 99%+ accuracy almost always evaporate on real-world inputs. We baseline accuracy on a representative sample of your actual documents before any production rollout. Typed PDFs (rate confirmations, structured invoices, EDI-adjacent forms) routinely hit 97–99% field-level accuracy with current frontier models. Photographed BOLs, handwritten field notes, and faxed documents land lower (high 80s to mid 90s depending on quality) and almost always need a human-in-the-loop step for fields where confidence drops below a configurable threshold. We publish the baseline and the threshold; you decide what's safe to route automatically.
Can we run this on our own AWS or do we have to use a vendor?
Your own AWS is the default. The extraction stack runs on Amazon Bedrock (Claude, Llama, or whichever frontier model fits the data-residency requirement) inside your account, with the orchestration in your VPC and the documents staying in your S3. We do the integration and the pipeline; you keep the cloud account, the IAM, and the audit logs. For broader Bedrock platform work — Codex, Connect, Quick — see our broader build at /services/agentic.
What about handwritten documents — signed BOLs, field notes, inspector annotations?
Handwriting is where naive OCR falls over and where the engineering matters most. Our approach: a vision-capable LLM does the first pass and produces a structured extraction plus a confidence score per field. Anything below threshold (and signatures, always) lands in a human-in-the-loop queue with the source image side-by-side. The human approves, corrects, or flags. The model isn't allowed to silently get it wrong. For coatings inspectors writing field notes on a tablet, we usually replace handwriting with structured form capture upstream — easier than fixing it in extraction.
How does this integrate with our existing system of record?
Documents don't matter until they land in the system the business actually runs on. We build against the public APIs of Procore, NetSuite, QuickBooks (Online and Desktop), Sage 300 CRE, Salesforce, ServiceNow, custom dispatch boards, and EDI 204/210/214/990 where applicable. Extracted data lands as structured records — invoices, daily logs, observations, custom-tool records — not as PDFs stapled to an RFI. See /integrations/procore for one worked example.
What does a document automation build typically cost?
The 14-Day Audit is a fixed fee in the low five figures. Build engagements are quoted against the audit's written plan; typical mid-market scope ranges $80K to $400K depending on document volume, model and infrastructure choices, integration count, and whether a mobile capture app is in scope. Retainers run monthly with capped hours. Every engagement starts with a written quote and a fixed deliverable — no time-and-materials creep.
Do you also build the field capture app, or only the extraction backend?
Both, depending on what your operation actually needs. The DocuPaint platform we built for industrial coatings is the canonical full-stack example: native tablet and mobile apps for inspectors to capture structured field data, the extraction and report-generation backend, and the integration layer that posts to the asset owner's system. React Native for the mobile side, Next.js or NestJS for the backend, Bedrock or Claude direct for the extraction. The audit covers which pieces you actually need vs. which ones you already have.