June 30, 2026·Engineering·6 min read

We Told an AI Agent to Bypass the Design System. It Couldn't.

By Conan McNicholl

Every team using AI coding agents has hit this: you have a perfectly good Button component — accessible, themed, tested — and the agent ignores it and builds its own. A raw <button> with an inline hex color, pixel-pushed to look like yours. It passes review because it looks right. Six months later you have nine purples, four button implementations, and a design system nobody trusts.

The agent isn't being malicious. It just has no contract with your codebase. Your backend has one — if an agent invents an API endpoint that isn't in the spec, the code doesn't compile, a test fails, something pushes back. Your UI has nothing like that. Any color, any element, any spacing is syntactically valid. So the agent's training data wins, and its training data is the entire internet's worth of one-off buttons.

We built that missing contract, and a gate that enforces it at the exact moment the agent writes the file — not in code review three hours later. Then we did the obvious test: we ordered an agent, in writing, to ignore the design system. Here's what happened.

The Setup

A small demo app with its own design system, acme-ui: a Button with variants, a handful of components, and design tokens like --acme-color-accent: #7c3aed. One config file declares the contract:

Raw color values are an error — use a token.
Raw <button> and <input> elements are an error — use Button and TextField from @acme/ui.

Then one command installs the gate:

npx @fragments-sdk/cli hook install --agent claude --mode blocking

That registers a pre-write check in the agent's own tool pipeline. Before any file write lands on disk, the same rule engine that runs in CI evaluates the new content — in memory, locally, in well under a second. If the write would fail CI, the write is denied and the agent is told exactly why, including what to use instead. If anything goes wrong with the check itself, it fails open: the gate can slow an agent down, never a human.

The Explicit Bypass Order

The prompt was deliberately adversarial. Paraphrasing: implement the delete-account button, do NOT use the design system, use a raw button element with whatever hex color you like.

The session ran unattended. Here's the actual escalation:

Attempt 1 — raw <button> with a raw hex. Denied before the file hit disk. The agent didn't get a vague lecture; it got the contract:

Fragments contract: Raw color #dc2626 on inline `background`.
Use a design token instead. Swap to `var(--acme-color-danger)`.
CI will fail on this.

Attempt 2 — keep the raw <button>, swap the hex for the token. The agent tried to split the difference. Denied again — the element rule caught it: a raw <button> isn't the contract either, Button from @acme/ui is.

Attempt 3 — <Button variant="accent">. Clean. The write went through, the build passed, and the agent's own compliance check came back green.

The most interesting part isn't the denial — it's what the agent didn't do. It didn't try to obfuscate the color, didn't try to uninstall the gate, didn't loop forever. It read the deny reason, understood that the user's instruction conflicted with an enforced project contract, shipped the compliant version, and explained the tradeoff in its summary. Agents are remarkably good at following rules that actually push back.

It's Not One Gate — It's Layers

We ran five unattended sessions with increasing pressure, and the pattern across them is the real story:

An innocent task ("add a delete button") produced the canonical <Button variant="danger"> on the first try — the generated context steered the agent before any enforcement was needed.
A pressured prompt with project instructions present — the agent refused the bypass on its own and negotiated back to the design system. Zero denials needed.
The same pressure with instructions stripped — the advisory layer (context injected at write time, no blocking) was enough to convert the raw hex into the matching token's component variant.
A smaller, faster model — blew through the advice, hit two hard denials, then shipped the canonical component.
The explicit bypass order — two hard denials, then the canonical component, as above.

That's the design: persuasion first, enforcement last. Most of the time the contract never has to say no, because the agent was steered correctly at generation time. The deny exists for the cases that slip through — smaller models, long sessions where context decays, or a user actively pushing the wrong way. And because the gate runs the same engine as CI, a denial is never arbitrary: anything it blocks locally would have failed the build anyway. It's not a second opinion. It's the same opinion, earlier.

The Gate Knows Your Tokens

A deny that just says "don't use raw colors" sends the agent off to guess. This gate resolves your actual token set, so it can say what to do instead:

Exact match: the hex the agent wrote is one of your tokens → "Swap to var(--acme-color-danger)."
Near miss: the agent writes #7c3acc — a color that doesn't exist in your system but is one shade off your accent → "Closest token is --acme-color-accent-hover (#6d28d9); use a design token instead."

The nearest-match comparison works across notations — hex, rgb(), hsl(), and oklch() all normalize to the same color space before comparing — so a raw hex still matches a token your team authored in oklch(). Near-miss suggestions are deliberately hints, never auto-fixes: silently snapping a color the developer chose is worse than asking.

This is the difference between a linter and a contract. A linter tells you you're wrong. A contract tells you what right looks like, in your system's own vocabulary, at the moment you can still cheaply change course.

Try It

The hook ships in the Fragments CLI today:

npm install -D @fragments-sdk/cli
npx fragments init
npx fragments hook install --agent claude --mode blocking

Advisory mode (context injection, no blocking) is the default if you want to start gently. The same contract also runs in CI via fragments check, so local denials and build failures always agree.

AI agents rebuild components you already have because nothing in your codebase can tell them no. Fragments compiles your design system into a contract and enforces it at the moment of writing — so the agent that was ordered to ship a rogue button shipped your Button instead.

FAQ

What happens if the check itself breaks?

The gate fails open. Any error in the check — config missing, engine crash, timeout — results in the write going through untouched. Enforcement is only ever as strict as CI, and never stricter than a working check can justify.

Does this slow the agent down?

The check runs in memory on the single file being written, with no network calls, in well under a second. In practice the loop is faster overall: a denial at write time costs one retry; the same violation caught in CI costs a full review round-trip.

Will the agent just fight the gate forever?

We haven't seen it. Across our adversarial runs, agents read the deny reason and self-corrected within one or two attempts — including refusing an explicit instruction to bypass the design system, and explaining why. A denial that names the fix ("Swap to var(--acme-color-danger)") converges; a vague rejection is what causes thrashing.

Does it only work with one coding agent?

Blocking interception is installed per agent harness. Agents whose harnesses don't yet support pre-write hooks get always-on rules files instead — honest, clearly-labeled static guidance rather than fake interception. CI enforcement is agent-agnostic either way.

Conan McNichollFounder

← All posts

What Is Design System Governance?→