Aneta Kahleová

Insights

Notes from the studio

Notes on production AI — context engineering, agentic orchestration, and what it takes to ship bots that do not hallucinate their way into trouble. Written by someone who has done it at scale.

June 25, 2026

Eval as an Input, Not a Dashboard: Building Self-Healing LLM Systems

Most teams treat evaluation as a scoreboard — a number you glance at and feel good or bad about. The frontier idea is to wire the eval back into the system as an input that rewrites it: traces go to independent judges, fixes get proposed automatically, and one human approves. It's AI all the way down, with a single gate that isn't.

June 24, 2026

Intent First: How a Bot Decides What You're Actually Asking

Before a bot can answer well, it has to know what kind of question it's facing. Skip that step and chitchat gets a database query, 'last month' goes to SQL untranslated, and every request runs the same dumb path. The fix is a cheap, decisive first move: classify the intent, extract the entities, resolve the time.

June 23, 2026

The Knowledge Hourglass: Deflate Everything, Then Inflate on Demand

Your knowledge base is enormous and the model's context window is tiny. The trick isn't a bigger window — it's an hourglass: compress everything ahead of time, then expand only what one question actually needs.

June 22, 2026

Maximum Recall: Why Your Retriever Should Cast Three Nets, Not One

You compressed your knowledge into clean indexes. Now comes the moment everything hinges on: given a question, can the bot actually find the handful of tables it needs among a thousand? Miss the right one and nothing downstream can save you — so at the retrieval step, recall beats precision.

June 21, 2026

Bootstrapping the Glossary: Let the Model Draft Its Own Domain Dictionary

Hand-writing the association layer is the right move for your first dozen tables. It does not scale to a thousand — or to a new client schema every month. The answer is to let the model draft the glossary from real data, then have a human curate it.

June 20, 2026

Associations: Teaching a Bot to Speak Your Domain

A user asks for 'employees.' Your database calls them H_OSOBA. No model bridges that gap on its own. The fix is the highest-leverage, least glamorous artifact in the whole system: a living glossary that maps human language to your schema.

June 19, 2026

Beyond Synonyms: The Knowledge Graph That Teaches a Bot What Things Mean

An association layer teaches a bot your vocabulary — 'employee' means this table. But some questions aren't about words; they're about relationships and business concepts no single table spells out. That's where a flat dictionary ends and a knowledge graph begins.

June 18, 2026

Controlled Autonomy: Build Your Agent a Leash, Not a Cage

Hand an agent total freedom and it wanders off and wrecks the task. Lock it in rigid if-else and it can't handle anything real. The production sweet spot is a fixed pipeline with autonomous pockets — rails the agent can't leave, and real decisions inside them.

June 17, 2026

The Bot That Gets Better Every Week: Closing the Feedback Loop

A bot that doesn't learn from use is frozen at launch quality — and launch quality is the worst it should ever be. The difference between a bot that goes stale and one that sharpens every week is a feedback loop that turns each interaction into signal.

June 16, 2026

The Judge: Why a Good Bot Never Trusts Its First Answer

A query that runs cleanly can still be wrong. The fix isn't a smarter model — it's a loop: generate, probe the data, let a judge decide if the answer makes sense, and feed concrete failures back until it does.

June 14, 2026

Designing for Uncertainty: When a Good Bot Asks, Defaults, or Bows Out

The judge loop is about being right. This is about the prior question: what does a bot do when it isn't sure? The worst ones barrel ahead and answer anyway. A good one has a repertoire — clarify, default, hedge, or hand off — and knows which to reach for.

June 13, 2026

Lazy Resolution and Right-Sized Models: Spend Intelligence Where It Counts

Most of what an agent does is trivial. Resolving a code, matching a value, extracting an entity — none of it needs your most expensive model. Defer the lookups, route the grunt work to cheap fast models, and save the genius for the genuinely hard part.

June 11, 2026

Shipping Behind Regulated Walls: On-Prem, Hybrid, and SQL You Can Trust

The best customers for a data bot — hospitals, banks — are the ones who legally can't let data leave the building. That constraint shapes everything: where the model runs, what the query layer is allowed to do, and how you prove the whole thing behaves.

June 6, 2026

RAG vs. Fine-Tuning: How to Pick (and Why It’s Usually RAG)

The most common question we get on a scoping call: should the bot be fine-tuned on our data, or use retrieval? Here’s how we actually decide.

June 5, 2026

Why Chatbots Hallucinate — and How We Engineer It Out

A confident wrong answer is worse than no answer. Hallucination isn’t magic — it’s a set of failure modes you can design against.