← Back to Blog

Context Window Exhaustion: AI Development's Biggest Pain Point

Paul Allington 21 April 2026 8 min read

"This session is being continued from a previous conversation that ran out of context."

I've seen that message so many times I could recite it from memory. It appears at the top of a fresh session, followed by a summary of what was supposedly discussed in the previous conversation, and then you carry on as if nothing happened.

Except something did happen. You lost context. You lost nuance. You lost the accumulated understanding that was built up over potentially hours of back-and-forth conversation. And now you're working with an AI that has a summary of what you were doing rather than the actual memory of doing it.

This is, without exaggeration, the single biggest practical frustration of AI-assisted development. Not hallucinations. Not wrong code. Not terminology drift. Context window exhaustion. It affects every major feature, every ambitious session, and every complex project. And almost nobody talks about it.

How Bad Is It, Actually?

Let me give you some numbers from our actual development work.

The mega website rebuild session - redesigning and rebuilding the entire thecodeguy.co.uk site - generated 1,074 assistant messages and hit the context limit four times. That's four continuations, four summary handoffs, four moments where accumulated context was compressed into a paragraph.

The GoSignal rebuild went through four continuations. The Conversations feature spec consumed multiple context windows just for the planning phase, before any code was written. Building the MCP server for Task Board hit the limit multiple times. Every substantial feature across every project had the same pattern.

This isn't occasional. It's the norm. If you're using AI to build anything more complex than a simple CRUD endpoint, you will hit context limits. Repeatedly.

What Actually Gets Lost

The continuation summary is better than nothing. It captures the broad strokes - what files were modified, what the current task is, what approach was agreed upon. But it misses the things that actually matter for continuity.

Design decisions and their reasoning. In the original conversation, you might have spent twenty messages discussing whether to use a modal or a slide-over panel, eventually deciding on a modal for specific UX reasons. The continuation summary says "using modals for this feature". The why is gone. So when the AI in the new session encounters a situation where a slide-over might make more sense, it doesn't know about the reasoning that already ruled it out. You end up re-having the same discussion.

Rejected approaches. Half the value of a long conversation is the dead ends you explored and abandoned. "We tried X, it didn't work because of Y, so we went with Z." The continuation summary captures Z but not the journey. In the new session, the AI might suggest X again because it doesn't know it was already tried and failed.

Accumulated understanding of the codebase. Over the course of a long session, the AI builds up an understanding of how your files relate to each other, where the quirks are, what patterns your codebase follows. After a continuation, that understanding resets to whatever the summary captured - which is always less than what was actually learned.

The tone and calibration of the collaboration. By message 200, the AI has calibrated to your communication style. It knows you want concise explanations. It knows you'll push back if it over-engineers something. It knows your preference for pragmatism over purity. After a continuation, it resets to its defaults. The collaboration feels slightly off until it recalibrates.

The "File Content Exceeds Maximum Allowed Tokens" Problem

There's a related problem that's worth discussing separately because it affects code architecture in a way that shouldn't be driven by tooling limitations.

Large Blazor components - and in our codebase, some components were genuinely large because they handled complex multi-step forms or real-time data displays - would sometimes trigger a "file content exceeds maximum allowed tokens" error when Claude tried to read them. The AI literally could not process the file.

The solution? Break the component into smaller ones.

Now, breaking large components into smaller ones is often good design. Component decomposition is a well-established principle. But I'd be lying if I said the motivation was always clean architecture. Sometimes the motivation was explicitly "this file is too big for the AI to read, so we need to split it up."

That's not good design. That's working around tooling limitations. And the distinction matters, because the optimal split for readability and the optimal split for AI context consumption aren't always the same thing. Sometimes a component is large because it manages a genuinely complex, interconnected piece of state, and splitting it up creates artificial boundaries that make the code harder to reason about.

We did it anyway, because the alternative was an AI that couldn't work with the file at all. But it left a bad taste.

The Practical Impact on Workflow

Context window exhaustion doesn't just waste time on re-explanations. It fundamentally changes how you plan your work.

You start thinking in terms of "what can I complete within a single context window?" rather than "what's the right scope for this task?" You break features into smaller chunks not because smaller chunks are better design, but because you know the conversation will die if you try to tackle the whole thing at once.

You develop a habit of over-documenting mid-session, writing down decisions and progress in external files, because you know the context might get wiped. That's time spent on documentation that wouldn't be necessary if the AI could just remember what you discussed an hour ago.

You start front-loading the important decisions in a session, trying to get architectural agreement early before the context fills up with implementation detail. This is actually a decent practice regardless, but the urgency is artificial - you're not front-loading because it's good process, you're front-loading because you're racing a timer.

What We Actually Do About It

I'll be honest - there's no perfect solution. Context limits are a fundamental constraint of current AI technology, not a configuration problem you can fix. But here's what helps:

Explicit context preservation. When you can feel a session getting long, take a moment to write down the key decisions, the current state, and the next steps in a file the AI can read in the next session. Don't rely on the automatic continuation summary. Write your own. You know what matters better than an automated summariser does.

Project memory documents. I've written about this before, but it bears repeating here: comprehensive CLAUDE.md files reduce the context burden in every session. If the AI doesn't need to learn your architecture, terminology, and conventions from scratch each time, it has more context budget for the actual work.

Smaller, focused sessions. Instead of one marathon session that hits the context limit three times, run three focused sessions with clear handoff documents between them. Each session starts fresh with full context capacity. The overhead of writing handoff notes is less than the overhead of losing context mid-task.

Structured handoff files. When I know a task will span multiple sessions, I create a handoff file at the end of each session: what's been done, what's remaining, what decisions were made and why, what approaches were tried and rejected. It's ten minutes of writing that saves thirty minutes of re-explanation in the next session.

Component and file sizing. We now think about file size as an AI constraint alongside readability and maintainability. If a file is approaching the token limit, we consider splitting it proactively rather than hitting the wall mid-session.

The Uncomfortable Truth

Here's what I keep coming back to: context window exhaustion is a fundamental limitation that we're all just working around. The workarounds are good. They help. They make AI development practical despite the limitation. But they're workarounds, not solutions.

The actual solution is larger context windows, better memory mechanisms, or persistent session state that carries forward between conversations. These are coming - every model release brings larger context windows, and various forms of AI memory are being developed. But we're not there yet.

In the meantime, we're restructuring our code, our workflow, and our planning around a tooling limitation. And we should be honest about that. When someone asks "how do you build complex features with AI?", the honest answer includes "we spend a meaningful amount of time managing context" alongside "it's incredibly productive."

Both things are true simultaneously. AI development is genuinely faster and more capable than traditional development for most tasks. And AI development requires a significant overhead of context management that traditional development doesn't. The net result is still overwhelmingly positive. But pretending the overhead doesn't exist would be dishonest, and I've been trying to be honest throughout this entire series.

Context windows will get bigger. Memory will get better. Someday I'll look back on this post the way I look back on complaining about 640K of RAM. But today, in 2026, context window exhaustion is the single biggest practical challenge of building real software with AI. Plan for it, build workflows around it, and don't feel bad when your fourth continuation of the day starts with that familiar message.

We're all seeing it. Nobody's just not talking about it.

Want to talk?

If you're on a similar AI journey or want to discuss what I've learned, get in touch.

Get In Touch

Ready To Get To Work?

I'm ready to get stuck in whenever you are...it all starts with an email

...oh, and tea!

paul@thecodeguy.co.uk