June 22, 2026

Maximum Recall: Why Your Retriever Should Cast Three Nets, Not One

Once you've done the hard work of compressing a sprawling knowledge base into clean indexes, you face the moment the whole system turns on. A question arrives. Somewhere in a thousand tables are the three or four the bot actually needs. Can it find them?

This step — retrieval — is uniquely unforgiving, because its failures are unrecoverable. Every later stage can fix an imperfect input: the generator can ignore an irrelevant table, the judge can reject a bad query. But nothing downstream can use a table that was never retrieved. A false positive at this stage is a minor tax. A false negative is fatal. That asymmetry should drive the entire design, and it leads to one conclusion: at the retrieval step, optimize for recall, not precision.

Any single net has holes

The tempting approach is to pick the best retrieval method and rely on it. The problem is that every method, on its own, is blind to something.

Pure semantic (vector) search is wonderful at paraphrase — it'll connect "headcount" to a table about employees even with no shared words — but it can stumble on exact technical terms and domain jargon, the very things that don't embed cleanly. Pure keyword or model-driven selection is sharp on explicit matches and domain knowledge but misses the user who describes a thing without ever naming it. Whichever single net you cast, there's a class of questions that slips through it.

You don't fix this by finding a better net. You fix it by casting several at once and pooling the catch.

Three nets, in parallel

A strong retriever runs several independent methods concurrently and unions their results. Three complementary nets cover most of the gaps:

Model batching. You can't show a model a thousand tables at once, so you split them into batches — fifty or so each — and run the batches in parallel, asking the model to pick the relevant tables from each, given the question, the extracted keywords, and each table's human name and associations. This is the net that understands context: it tolerates typos and word inflection, and it knows domain links a keyword match would miss — that "employee," "worker," and "contract" all point at the same cluster of tables. Batching is what makes it scale to a huge schema without overflowing the context.

Semantic search. Embed not just the raw question but everything you extracted from it — the intent, the entity texts, their normalized forms, the keywords — and search the vector index for tables whose names and associations are similar in meaning. Keep a similarity threshold to stay relevant, but guarantee a floor of candidates so a hard question never comes back empty. This is the net that catches paraphrase and fuzzy meaning.

Association search. Match the question's keywords specifically against the hand-written and auto-generated glossary. This is the net tuned to your domain's private vocabulary — the internal nickname, the acronym, the term that means something only here.

And the entity step from earlier pays a dividend: the candidate tables that named-entity recognition already nominated drop straight into the pool. By the time these run, you have four independent opinions on which tables matter.

Union, not intersection

Here's the decision that defines the whole strategy: you union the results, not intersect them. You keep every table any method surfaced, not only the ones they agreed on.

This feels wrong if your instinct is precision — surely the tables all three methods picked are the safest? But that instinct optimizes the wrong thing. Intersection throws away exactly the catches that justify having multiple nets: the table only semantic search found, the one only the glossary knew. Each method exists because it sees something the others don't, and intersection discards that contribution by design.

Union is the right call because the costs are wildly asymmetric. An extra, irrelevant candidate costs you a little context downstream — and the generator and judge are built to prune it anyway. A missing relevant table costs you the answer: an unanswerable question, or worse, a confident query against the wrong data. When one kind of error is a minor tax and the other is fatal, you bias hard toward catching everything. Better five candidates with one spare than four that quietly dropped the one that mattered.

Recall here, precision later

Step back and this is a general principle of retrieval-augmented systems, not a quirk of databases. The retrieval stage and the reasoning stage have different jobs. Retrieval's job is to not miss — to guarantee the right material is in the room. Reasoning's job is to select — to pick the right material from what's there. Conflate them, and you get a retriever that's too cautious, dropping good candidates to look precise, and starving the reasoning stage of options it can't recover.

So you split the responsibility cleanly. Cast wide at retrieval with a union of complementary methods, accept the extra candidates, and let the later stages — generation, sampling, the judge — do the precise pruning they're good at. Hybrid retrieval that combines lexical, semantic, and domain signals beats any single method not because each is individually better, but because their blind spots don't overlap.

The cost worry mostly evaporates in practice. Batching and parallelism keep even a thousand-table sweep fast, and the work is the kind of low-intelligence matching you hand to cheap, quick models rather than your most expensive one. You pay a little more at the cheapest stage of the pipeline to make the expensive stages possible — which is exactly the trade you want.

Get retrieval right and everything after it has a chance. Get it wrong — miss the one table — and the smartest model in the world will write you a flawless query against data that can't answer the question.


Is your bot occasionally just... missing the right data, then confidently answering anyway? That's almost always a recall problem at retrieval, not a reasoning problem. Let's look at how many nets yours is casting.