Agent Customization
Choosing the Right Layer
Not every project needs all three layers. Here is a rough decision guide:
You want the agent to...
│
├── ...understand your codebase conventions
│ └── Layer 1: add an instruction file
│ (copilot-instructions.md / CLAUDE.md / AGENTS.md)
│
├── ...call external APIs or run code
│ └── Layer 2: define tools
│ (function calling / tool use / skill.md)
│
├── ...connect to many external services reliably
│ └── Layer 3: expose an MCP server
│
└── ...delegate to specialised sub-agents
└── Layer 3: use an agent SDK with handoffs
Instruction files are free, commit-tracked, and platform-portable. Before reaching for tools or protocols, ask whether a well-written CLAUDE.md or copilot-instructions.md already gives the agent enough context. In many projects it does.
An Open Question: Is Configuration Enough?
Everything covered so far operates above the model weights. You write files, define schemas, and wire up servers — but the underlying model is untouched. For many teams, that is the right call: fast to iterate, easy to version-control, no GPU budget required.
But a provocative argument is gathering momentum. As frontier models flatten their general-purpose capability curves — the era of reliable 10× reasoning jumps per generation may be behind us — the remaining step-changes are increasingly found in domain-specialised intelligence: models whose weights encode an organisation’s proprietary data, decision logic, and institutional vocabulary.
This is the territory of Layer 0: fine-tuning, continued pre-training, and reinforcement learning on internal datasets. It is a fundamentally different kind of customisation — not shaping how an agent uses a model, but shaping the model itself.
A Mistral AI analysis published in MIT Technology Review (March 2026) frames this shift as an architectural imperative, and proposes three organising principles worth sitting with as open questions:
Treat AI as infrastructure, not an experiment. Ad hoc fine-tuning runs produce brittle, one-off pipelines that break when base models are updated. Durable customisation is version-controlled, reproducible, and decoupled from the underlying weights — so the “adaptation logic” survives model upgrades. Open question: how do today’s instruction files and skill definitions age when the base model underneath them changes significantly?
Retain control of your data and your model. Vendor lock-in at the configuration layer (API keys, hosted embeddings, proprietary tool registries) is one risk; lock-in at the weight layer is another order of magnitude more consequential. Organisations that control their training pipelines can enforce data residency, dictate update cycles, and keep the “digital nervous system” in house. Open question: does the current MCP / tool-use ecosystem reduce or increase structural dependency on a small number of AI providers?
Design for continuous adaptation. A fine-tuned model is not a finished artifact — it decays as the world changes. Competitive advantage compounds for organisations that build ModelOps: automated drift detection, event-driven retraining, incremental updates. Open question: what is the equivalent of ModelOps at the configuration layer? How do you detect when an instruction file has gone stale?
The article that prompted these questions — Shifting to AI Model Customization Is an Architectural Imperative (MIT Technology Review, March 2026) — is sponsored content produced by Mistral AI, not MIT Technology Review’s editorial staff. The argument is substantive, but the source has commercial skin in the game: Mistral sells domain-specific fine-tuning through its Forge platform. Read it as a well-argued vendor perspective, not a neutral analysis.
Whether weight-level customisation eventually subsumes configuration-level customisation, or whether the two layers remain complementary indefinitely, is genuinely open. For most teams today the practical answer is still: start with a good instruction file. But knowing the deeper layer exists — and understanding the trade-offs between controlling behaviour and controlling weights — is the difference between a tactical choice and a strategic one.
API Documentation Reference
| Platform | Resource | URL |
|---|---|---|
| GitHub Copilot | Custom instructions | docs.github.com |
| GitHub Copilot | Coding agent setup | docs.github.com |
| Anthropic Claude | CLAUDE.md & memory | docs.anthropic.com |
| Anthropic Claude | Tool use | docs.anthropic.com |
| Anthropic Claude | MCP quickstart | docs.anthropic.com |
| OpenAI | Function calling | platform.openai.com |
| OpenAI | Agents SDK | openai.github.io |
| OpenAI | Orchestration & handoffs | platform.openai.com |
| MCP | Specification & SDKs | modelcontextprotocol.io |
| MCP | Server registry | mcp.so |
Further Reading
- Building effective agents — Anthropic’s engineering blog post on the patterns that actually work in production agentic systems.
- How can we develop transformative tools for thought? — Matuschak & Nielsen on the broader question of what AI tooling should aspire to.
- OpenAI Swarm (archived) — The educational predecessor to the Agents SDK; the code is simple enough to read in an afternoon and clarifies the core handoff concept.
- LangGraph documentation — Graph-based agent orchestration from LangChain; especially useful when you need cyclic, stateful workflows.
- AutoGen documentation — Microsoft Research’s framework for conversational multi-agent systems.