I was reviewing a marketing drip campaign that Claude had built for Task Board - automated email sequences, onboarding flows, the whole thing. It looked good. The logic was solid, the copy was decent, the scheduling made sense. Then I actually read the content.
Channels. Everywhere. "Create your first channel." "Invite your team to a channel." "Organise tasks by channel."
Task Board doesn't have channels. It has boards. It's called Task Board. The word "board" is literally in the name of the product.
I flagged it: "Looking through the drip job, I can see multiple references to channel instead of board. Can you fix this, and also tell me why you keep using channel?"
The response was honest, which I appreciated. Claude explained that its training data is heavily weighted towards tools like Slack, Trello (which does use boards, ironically), Discord, and other collaboration platforms where "channel" is the dominant organisational metaphor. When generating content about a task management tool, the word "channel" kept surfacing because that's what statistically follows phrases like "organise your work in" and "invite your team to" in the training corpus.
Fair enough. But it kept happening.
The Training Data Tug-of-War
This wasn't a one-off mistake. Over multiple sessions across several weeks, "channel" kept creeping back in. I'd correct it, Claude would fix it, we'd move on. Next session, fresh context, and suddenly we're back to channels again. It was like playing whack-a-mole with vocabulary.
The underlying problem is genuinely interesting once you stop being annoyed by it. Language models generate text by predicting the most likely next token based on patterns in their training data. When the training data overwhelmingly associates task management tools with "channels" (thanks to the dominance of Slack in that space), the model has a statistical bias towards that word. Your project-specific terminology is fighting against millions of training examples, and sometimes the training data wins.
This is not a bug. It's a fundamental characteristic of how these models work. And once you understand that, you start noticing it everywhere.
It's Not Just Channels
The channel/board confusion was the most obvious example, but the same pattern showed up in subtler ways across all our projects.
In TestPlan, Claude would occasionally refer to "test suites" when our product uses "test runs". Different concept, different terminology, but "test suite" is far more common in the training data because that's what most testing frameworks call them.
When writing copy for The Code Zone, it would slip into American English spelling - "organize" instead of "organise", "color" instead of "colour". Not because I hadn't specified British English, but because the vast majority of programming-related content in the training data is American English. The statistical weight just pulls in that direction.
Variable naming was another one. Claude would default to generic conventions - userId, channelId, workspace - rather than domain-specific names that match our codebase. Our Task Board code uses boardId, swimlaneId, workflowStepId. These are specific to our domain, and the AI had to be repeatedly reminded to use them.
It's the same root cause every time. The model's "memory" of what words typically go together is fighting against the specific reality of your project.
Why Correcting It Once Doesn't Stick
Here's the thing that frustrated me most: corrections within a session worked perfectly. I'd say "use board, not channel" and Claude would immediately comply for the rest of that conversation. But the next session? Clean slate. Back to channels.
This is the stateless nature of AI conversations. Each session starts fresh. The model doesn't remember that we had this argument yesterday, or the day before, or the day before that. It's like training a new employee every morning who has the same excellent skills but absolutely no memory of yesterday's work.
For terminology specifically, this is maddening. Because the correction is simple - just use the right word - but it needs to be made every single time. And when you're deep into a complex implementation task, the last thing you want to be doing is proofreading for vocabulary mistakes.
Project Memory: The Actual Solution
The fix, when we found it, was embarrassingly straightforward. We added terminology rules to our project context documents.
In the CLAUDE.md file for Task Board, we added a section that explicitly lists our domain vocabulary: "Boards, not channels. Swimlanes, not columns. Workflow steps, not stages. Cards, not tickets." Simple, declarative, impossible to misinterpret.
For TestPlan: "Test runs, not test suites. Test cases, not test scripts. Features, not modules."
For The Code Zone: "Always British English spelling. Students, not users. Sessions, not classes. Tutors, not teachers."
The effect was immediate and dramatic. With the terminology explicitly stated in the context document that Claude reads at the start of every session, the channel problem virtually disappeared. Not completely - occasionally under heavy generation the statistical bias still wins - but the error rate dropped from "constantly" to "rarely".
The Deeper Lesson: Context Documents Are Not Optional
This experience taught me something important about working with AI that I don't think gets discussed enough. The AI's training data creates a gravitational pull towards generic patterns. Your project-specific requirements are the exception, not the rule, from the model's perspective. And exceptions need to be explicitly stated, every single time.
I started thinking of context documents as the AI equivalent of a company style guide. Nobody expects a new copywriter to know your brand voice on day one. You give them guidelines. You point out common mistakes. You create reference documents they can consult. AI needs exactly the same thing, except it needs it at the start of every single session because it can't remember yesterday.
The projects where we invested most heavily in context documentation - detailed terminology guides, architecture decisions, naming conventions, common patterns - were consistently the projects where AI output required the least correction. The correlation was direct and obvious in hindsight.
Fighting Training Data Bias: A Practical Checklist
For anyone hitting similar problems, here's what actually worked for us:
Document your terminology explicitly. Don't assume the AI will infer your naming conventions from the codebase. State them directly. "We call X this, not that." Be specific and comprehensive.
Include negative examples. It's not enough to say "use board". Say "use board, never channel". The explicit prohibition is more effective than the implicit expectation, because it directly addresses the bias the model has.
Put terminology at the top of your context document. The earlier it appears in the context, the more weight it carries in the model's attention. Don't bury your naming conventions at the bottom of a long document.
Review generated content specifically for terminology drift. Make it a conscious step in your review process. Not just "does the code work?" but "does the code use our language?" This matters more than you'd think, because wrong terminology in code comments, variable names, and user-facing strings creates confusion for everyone who reads it later.
Update your context document when you find new drift patterns. Every time you catch the AI using the wrong term, add the correction to the project context. Think of it as building an immune system - each correction makes the next session better.
The Bigger Picture
Training data bias in terminology might sound like a minor annoyance. And individually, each instance is. But collectively, it reveals something fundamental about working with AI: the model doesn't know your project. It knows projects like yours. And the difference between "your project" and "projects like yours" is exactly where the bugs, the wrong terminology, and the misguided patterns live.
Every project has its own vocabulary, its own conventions, its own way of describing things. This domain-specific knowledge is part of what makes an experienced team member valuable - they don't just know how to code, they know how to talk about the product in a way that's consistent and precise.
AI doesn't have that. But it can be given it, explicitly, at the start of every session. The teams that figure this out early will spend far less time correcting vocabulary and far more time building features.
And their marketing emails will definitely say "board" instead of "channel".