Building with Generative AI in Production

The gap between demos and production

Everyone has seen the impressive ChatGPT demos. But deploying generative AI into production workflows — where reliability, cost, and latency matter — is a fundamentally different challenge.

Over the past two years, I’ve led the adoption of AI and generative AI solutions across our engineering organization. Here’s what I’ve learned.

Start with the workflow, not the model

The most common mistake I see teams make is starting with “let’s use GPT-4” and then looking for a problem to solve. The right approach is the opposite:

Map the manual workflow — What are humans doing today? Where are the bottlenecks?
Identify the repetitive, high-volume tasks — These are your best candidates
Prototype with the simplest model that works — You’d be surprised how often a smaller model outperforms a larger one for narrow tasks

The agentic pipeline pattern

One of the most impactful architectures I’ve built is the agentic multi-model pipeline. Instead of one monolithic LLM call, you chain specialized steps:

Prompt generation — A small model that transforms business requirements into optimized prompts
Primary generation — The main model does the heavy lifting
Automated QA — A vision model or classifier validates the output
Retry loop — Failed QA triggers regeneration with feedback

This pattern reduces cost, improves quality, and gives you observability at every step.

What actually matters

After building several AI-powered products, these are the things that matter most:

Observability — Log every LLM call, every token count, every latency measurement. You can’t improve what you can’t measure.
Graceful degradation — When the AI fails (and it will), what’s the fallback? Build it from day one.
Human-in-the-loop — For high-stakes outputs, always have a review step. Automate the 80%, let humans handle the edge cases.
Cost management — Monitor your API spend like you monitor your cloud costs. It can spiral fast.

Looking ahead

The field moves incredibly fast. What I’m most excited about right now:

Smaller, specialized models — The trend toward efficient, task-specific models is the right direction for production use
Better tooling — The observability and evaluation ecosystem is maturing rapidly
Multimodal pipelines — Combining text, image, and code generation in single workflows

The key is to stay hands-on. Build POCs, test hypotheses, and form your own opinions. The gap between what’s hyped and what’s useful is still wide — and that’s where the real engineering happens.