June 20, 2026
Associations: Teaching a Bot to Speak Your Domain
Here is a question that breaks more knowledge bots than any model limitation: a user asks "how many employees do we have?" and your database stores people in a table called H_OSOBA. Nothing in those seven characters tells a model that this is where employees live. The column for their title is TITUL_PRED_KOD. The lookup that turns a code into "Engineer" is H_C_TITUL_PRED. To a language model — to any reader who didn't build this schema — these are opaque.
Teams reach for the obvious fix first: surely the database has human-readable names somewhere? It usually does, and they're usually useless — auto-generated, abbreviated, or wrong. The real domain knowledge — that "personnel," "staff," "headcount," and "employees" all point at H_OSOBA — doesn't live in the schema at all. It lives in the heads of the three people who've worked there longest.
The job is to get that knowledge out of their heads and in front of the model. The artifact that does it is what we call an association layer, and it is the single highest-leverage thing you can build.
What an association actually is
An association is a small, human-authored translation between your domain's language and your schema's names. It comes in three flavors, increasing in richness.
- Aliases — the words real users say. For
H_OSOBA: employee, person, staff, personnel, worker, headcount. Bare keywords, the vocabulary of the people asking questions. - Hints — a sentence of plain-language context. "This table holds the master record for every employee." Not a keyword, an explanation — the kind of thing you'd tell a new hire on day one.
- Column notes — targeted glosses for the cryptic fields.
TITUL_PRED_KOD: the code for the title before a name, e.g. Ing., Mgr. This is where most of the cryptic-schema pain actually lives, and where a single line of text saves the model from guessing.
None of this teaches the model new reasoning. It teaches the model your vocabulary. And vocabulary, it turns out, is almost the entire battle.
If you've ever seen Atlassian's Rovo underline an acronym in Confluence and explain that "SB" means "Stylebot," you've seen an association layer at work. It's the same move: scan for the domain's private language and make it legible. We just do it for a database instead of a wiki.
Why this is knowledge management, not configuration
It's tempting to file associations under "config" and move on. That undersells what's happening. An association layer is an act of knowledge capture: you are taking tacit institutional knowledge — undocumented, tribal, walking-out-the-door-when-someone-quits — and encoding it into a durable, queryable form.
That reframing changes how you treat it. A glossary that captures institutional knowledge deserves to be owned, versioned, and maintained like the asset it is — not buried in a code comment. The best association layers are editable by a domain expert who can't write a line of SQL, because the person who knows that "headcount" means H_OSOBA is rarely the person who wrote the query engine.
You inject it — you don't retrain on it
The mechanism is deliberately boring, and that's a feature. Associations are delivered through in-context learning: at the moment the bot is deciding which tables a question touches, you inject the relevant associations straight into the prompt. No fine-tuning, no training run, no model artifact to version.
Instead of showing the model a bare table name, you show it a briefed one:
Table: H_OSOBA — Aliases: employee, staff, personnel, worker — Notes: master record for every employee — Columns: TITUL_PRED_KOD (title before the name, e.g. Ing., Mgr.), H_OSO_PXID (primary key)
The difference in table-selection accuracy is not subtle. And because it's injection rather than training, a domain expert can improve the bot at 4pm and the bot is smarter at 4:01 — no deploy, no retrain, no waiting.
One detail worth stealing: feed the associations into both phases. The bot uses them when it picks which tables are relevant, and again when it writes the actual query. The same glossary that helps it find H_OSOBA also reminds it that titles join through H_C_TITUL_PRED rather than being free text.
Build it backwards, from the questions you must answer
The instinct is to write associations for every table in the schema. Don't. That's weeks of work, most of it wasted on tables no one ever asks about. Build the glossary backwards, from the questions the bot has to handle.
- Take a question you must support. "How many employees have an engineering degree?"
- Find the correct answer's tables. Write or borrow the query that answers it; note exactly which tables and columns it touches.
- Write associations for precisely those. Give
H_OSOBAandH_C_TITUL_PREDthe aliases and notes that would have led the bot straight to them. - Repeat, driven by failure. When a real question lands on the wrong tables, that's your next association. The telemetry tells you where the vocabulary gap is — you don't have to guess.
This turns an open-ended documentation project into a tight, demand-driven loop. You write the glossary the bot actually needs, in the order it needs it.
Make it survive a database reset
A hard-won lesson: development databases get dropped and rebuilt constantly, and any knowledge you typed directly into them dies with them. So associations don't live in the database as their source of truth. They live in a versioned file — JSON is plenty — that sits in source control next to the code.
A small migration step reads that file and seeds the association store on every fresh build: drop all, insert all, done. Your institutional knowledge is now as durable as your code, survives every reset, and travels with the repository.
This also solves the portability worry that stops a lot of teams. "We're building associations for a demo database, but the client runs a different one — isn't this throwaway work?" The data is per-deployment, yes. The mechanism — the file format, the loader, the injection point — is universal. You build it once, and for each new client you populate a new glossary (by hand at first, later semi-automatically from their documentation). The hard part is the plumbing, and the plumbing ports perfectly.
The unglamorous thing that makes everything else work
Retrieval strategies, vector search, clever orchestration — they get the attention. But none of them can find H_OSOBA from the word "employee" if nothing in the system has ever connected the two. The association layer is what makes the rest of the machine possible. It's almost a cheat: a few hundred lines of carefully written vocabulary will do more for your bot's accuracy than a great deal of model tuning.
Write it backwards, keep it in source control, hand the keys to a domain expert, and inject it everywhere the model has to understand what your users mean. It is the least glamorous artifact in the build. It is also the one we'd refuse to ship without.
Sitting on a schema only three people fully understand? That's exactly the knowledge an association layer captures. Book a scoping call and we'll map your first set of associations against the questions you most need answered.
