Model selection
Sonnet is the default. It handles the bulk of operational work — document extraction, multi-step agents, internal copilots — at a price point that lets you actually deploy at volume. Opus comes in for the workloads where reasoning depth genuinely changes the outcome (complex contract analysis, multi-step deduction over ambiguous source material, evals where every accuracy point compounds). Haiku is the right pick for high-volume classification or routing where latency and cost matter more than the marginal accuracy. We pick per workload, not per fashion cycle.
Deployment surfaces
Anthropic API direct is the simplest path and what we default to for new builds. AWS Bedrock is the primary path when the client needs data residency inside their AWS account, has a Bedrock commitment to spend down, or runs the rest of the stack on AWS. Google Vertex AI is supported but less common in our work; we’ll use it when the workload genuinely lives inside Google Workspace or BigQuery. The integration shape is consistent across surfaces; the differences are quota management, model availability, and IAM.
MCP server architecture
We build MCP servers when an agent needs to call into the same internal systems repeatedly from different surfaces (chat, agent runtime, Claude Code), and you’d otherwise be reimplementing the tool interface three times. We don’t build MCP servers for one-off integrations where a direct function call is fine. The decision lives in the audit, not in a tech-stack-by-default reflex.
Eval harness and observability
Every prompt-driven workflow ships with an eval set — typically 30 to 100 representative inputs with expected outputs (or expected shape, for free-form generation). The suite runs on every prompt change, every model-version bump, every change to the surrounding code. Observability layers on top — Sentry releases tied to deploys, structured logs into your existing aggregator, runbooks for every on-call scenario we can predict. The same operational governance substrate we run every production engagement on. Same approach we apply on our broader agentic AI work.