Have you ever spent hours of work in a Claude session, realized you couldn’t use that same session for the next phase of the work, and still wanted to preserve the context? What if you could split the task across two agents — without having designed that capability into a preprompt up front?
I want to walk you through a small piece of social engineering between me and two Claude Code agents. It came in the form of a collaboration protocol defined in a markdown file. The file itself matters less than how it got written: the agents helped me write it while they worked on their role-specific tasks.
Here’s how it came together.
Context was running out!
I could see I was running myself into a corner with the complexity of the prompts I was providing.

I was rewriting the opening of a workshop deck I’ve been building. (It’s an impress.js presentation, the kind where slides float around a 2D canvas instead of just paging.) The old opener had a “what if your terminal could think?” hook. I wanted a different opener, one that named a real thesis: prompts are an asset, and recording and iterating them builds a personal library that compounds over time.
Impress.js is beautiful. My workshop content is hard to understand in a single shot. Using a beautiful presentation framework gave me an opportunity to present ideas in a way that would keep the reader engaged.

My objective in the session was to brainstorm different approaches for expressing the ideas presented in the draft of my presentation.
I had lots of bad “first-draft” clips of material in the presentation. I’ve discovered they were bad through occasionally humiliating practice with people who were nice enough to be a test audience. There was some bruised slides that needed to be renovated.
This copy-drafting activity naturally consumed a lot of context. I was asking the LLM to generate 4-5 different versions of the same content. Then I’d move on to other sections of the presentation and repeat. I’d spend most of an afternoon brainstorming language with Claude Code. By the end, the model’s context was full of voice coaching: tone rules, banned phrasings, half a dozen near-final drafts. Rinse and repeat, my gas tank of available Context was rapidly trending towards empty.
I was going to start nudging the agent to edit the presentation html — but my /context size suggested I wasn’t going to get very far. The agent should compact before too much longer. The session summary that would be generated would end up losing important rules. It’s going to end up like a screenshot of a gif saved a a jpeg. A mutant.
You probably can see where this is going. The same agent that had spent four hours iterating on word choice was about to make numerous precise edits to a 2,000-line HTML file. Editing source is a different kind of work than picking metaphors. An agent doing code manipulation needs a lot of free context and extremely pithy context statements. The goal for the agent should be exclusive: place the chosen content in the correct slides in ways that wouldn’t break the on-screen presentation. But here I was- I needed to change tasks to a very different memory management workflow.
What can we do about a foreseeable failure? How can we preserve the brainstorm context?
The states I was trying to cultivate
I find it helps to describe the end states you want to create and the bad end states you’d like to prevent.
States I wanted to produce:
- A clean context window for the editing work
- The brainstorming context is preserved
- A backing repository that was never mid-drift — never in a state where a slide had been updated but the directive content creation docs were stale- we needed to update content creation docs with every edit.
- An auditable decision log explaining why each change happened
States I wanted to avoid:
- A saturated agent making source edits
- Brainstorming work lost the moment I closed the chat
- Edits to a slide before the syllabus and teacher’s guide were updated to match the content
- A change in the repo with no recorded reason. Future-me does not enjoy reconstructing intent by inspecting diffs.
Almost everything in these lists is about context. What if we could have multiple agents leveraging shared context?
What you’d actually need to try this
If you wanted to reproduce this experience, you’d need four things:
- Roles: Each agent announces what it is on its first message. (“Acting as Drafter.” “Acting as Editor.”)
- Role-specific Turf, split by directory. Each agent gets their own directory. We define “turf” and specify that Crossing turf without permission is a violation. Agents have definition statements that express what is approved and disallowed behavior. We tell the agents to be good by only writing in their directory. We tell them bad agents try to write in the other agents directory. We tell the agent: “don’t be a bad agent!” (Evolved developers will use sandboxing). We establish a concept of turf, as a mechanism for making the agents distinct.
- A unit of handoff. The agents would produce one markdown file per atomic edit, with a known structure: what the current text looks like, what it want to change it to, and why. The structure isn’t significant. What matters is that handoffs are small, named, and reviewable.
- A human bus. The two agents can’t talk to each other. The human is the message bus. You paste a one-line “HANDOFF: 3 drafts proposed” from one agent into the other.
Obviously, if I had come up with this idea in advance of the session, I could have proposed a shared file system location for managing the handoffs. This example is a summary of how to back yourself out of a corner when context has gotten low.
The lifecycle, the verification rules, the conflict resolution all grew out of these four primitives as I used them. I didn’t design them up front. They were a mushroom growing on some scaffolding.

Agents don’t self-approve
There are 5 possible artifacts states in this newly developed collaboration protocol:
- Handoff
- Proposed
- Approved
- Applied
- Blocked
I’m the Operator. I am the man-in-the-middle! Neither agent can mark its own work as done. The Drafter Agent writes a proposal and sets its status to proposed. Only the operator can move a proposal to approved. The Editor Agent will not touch a draft that isn’t approved. After the Editor agent applies it, it sets the status to applied and records the commit hash that landed the change.
There’s a fifth status, blocked. It means the Editor agent opened a draft, looked at the live source file, and noticed the source had shifted since the Drafter agent wrote the proposal. It could be that a different draft already changed adjacent text. Rather than apply something that no longer fits, the Editor agent sets the status back to blocked with a note. The Drafter agent has to re-read the file and revise.
The Drafter agent discovers that its work is now stale, but it doesn’t decide whether to ship. The operator will need to intervene.
The agents helped me write the protocol
I didn’t write the protocol up before the session. I drafted a first version defining concepts of turf, draft format & draft status lifecycle and put it in a file called v1-edit-protocol.md. Both agents read it at the start of every turn. As I used it and the protocol evolved.
The Editor agent, on its third or fourth applied draft, noticed it was hitting an ambiguous case: a draft of proposed content would result in a layout change to a slide. The Editor agent had no way to tell whether the layout would actually look right when rendered.
The Editor Agent observed, “I can’t verify this without rendering the deck. Do you want me to flag layout-changing drafts in pre-review?”
That interaction with the operator resulted in the creation of the visual-layout-verification rule, which is now enshrined in the protocol file.
The Drafter agent noticed that drift in its source documents wasn’t reliably being discovered. I was at fault. Sometimes I’d remember to check whether a slide change needed a corresponding update in the syllabus. Sometimes I’d forget. It proposed a stricter formulation: every draft has to declare what it searched for and discovered in each source document during an edit. If there was a different result than previous versions, it implied an outside party (likely the operator) tampered with source materials.
I carried each of these rule proposals between agents. The other agent would read the new rule, push back if it didn’t make sense, and we’d land on something both agents could follow. Then I’d update the protocol file, and both agents would pick up the new version on their next read.
You can dynamically give multiple agents a mechanism to interact, and the interaction mechanism can be evolved by agents.
You might not need a multi-agent framework, a router, or an orchestrator, or a fancy preprompt that anticipates every situation.
You might be able to get away with a shared markdown file, a turf table, and a willingness to let the agents discover where the contract needs to grow.

Use Case: Rendering & Observation
Here’s the layout-verification rule the Editor proposed, in the form it eventually took.
When a draft changes anything about how a slide is positioned, the Drafter Agent has to render the deck, click through the affected region, and write a one-sentence observation into the draft’s rationale. Something like: “Verified rendering at the local server. The 500-unit gap between row 1 and row 2 produced visible overlap. Adjusted to a 1,000-unit gap and reverified.”
This observation exists because impress.js layouts can fail visually in ways that can’t be caught by agent driven tests. E.g. Two slides can technically be at valid coordinates and still overlap on screen. I’ve only been able to discover these bugs by being a human reviewer in the loop. Writing tests for validating human usability of a canvas UI element is Very Hard.
The Editor agent didn’t try to solve the layout problem with the observation. It recognized that a known layout failure condition had been identified flagged the gap to me, and we collaborated to co- write a rule that pushed verification to the editor agent. I hadn’t defined a way to embed this validation activity into the editor agent. The editor agent discovered it needed to be able to check its own work- and since it can’t change its own rules, it worked with me to create new role definitions that closed the gap. The protocol grew exactly where it needed to.
When two agents want the same file

Conflicts are rare because the turf table prevents most of them. The agents only write in their own directories. For the cases the table doesn’t cover:
- Editor wins for
presentation/(the source files) - Drafter wins for
.drafts/(proposals) - Anything else is mine to resolve
If I edit a file directly outside the protocol — patch a typo, fix a broken link, whatever — I announce it with a HANDOFF line so both agents re-read the file before their next operation.
What this looks like on disk
If you cloned the repo right now and poked at it, you’d see the artifacts of the protocol:
# All the proposals, one per atomic edit
$ ls .drafts/ | wc -l
67
# Every Editor commit that landed a change
$ git log --oneline --grep "^content:"
1090b0f content: rewrite Topic 9 Explain to use relative symlink pattern
a48bb30 content: rewrite Topic 1 Tell to lead with library thesis
# Every Drafter commit (proposals, never source files)
$ git log --oneline --grep "^drafts:"
# Anything currently kicked back to the Drafter
$ grep -l "status: blocked" .drafts/*.md
# Trace any applied draft to the commit that landed it
$ grep -l "applied_commit: 1090b0f" .drafts/*.md
.drafts/slide-09-explain-symlink-rewrite.md
Each draft links to its git commit hash. Each commit links back to its draft in the message body. The commit history becomes a ledger of decisions, with rationale.
The key takeaways from this weird experiment in ad-hoc collaboration protocol establishment:
The default mental model for most people’s agent work is one human, one agent reflected in one conversation.
When an LLM conversation gets too long or too saturated, you have to start over.
You compact the session summary and lose detail about what has been done. You push through your compressed session and accept degraded LLM output. It’s lossy and degrades over time.
This is context degradation is why Browser-based use of LLMs are a dead end.
They keep builders tethered to the ground. You need agents that interact with version controlled artifacts that can help establish and maintain context.
Your mental model should be: interactions with LLMs produce boundaried, artifact-driven workflows.
Spin up multiple agents. Define and distribute turf amongst them. Let them read a shared file at the start of every turn. Operate as the message bus by copying messages between the agents. And (this part feels weird) when the protocol has a gap, ask the agents to help you fill it. Agents notice the gaps faster than you do.
After you establish an operating pattern that works, evolve the agents to writing the prompts you’re pasting to a shared file. Remove yourself from the loop- and focus on approving or rejecting change requests from the agents.
You’re collaborating with agents who are also collaborating with each other, through a contract you collaboratively maintain. The contract can be a markdown file that evolves every day. The agents can collaborate with the operator to add a new section to the contract.
The foundational rule is: Every agent reads the collaboration contract before acting.










