Designers have something AI doesn't: taste.
You know what "the right blue" looks like. You can feel when a page's spacing is too tight. You can look at two Glassmorphism implementations side by side and tell which one is elegant and which one is cheap — without needing to explain why. That judgment comes from years of doing this. It doesn't live in language. It lives in you.
And that's exactly where the trouble starts.
Tell AI "Glassmorphism style" and it gives you a result. Say the same thing tomorrow and you get something different. Both technically qualify — just like three designers each interpreting "brutalism" will give you three completely different outcomes. Professional vocabulary solves the problem of communication efficiency. It doesn't solve the problem of consistency.
This is the most underestimated challenge in AI-assisted design: not that AI isn't smart enough, but that it has its own interpretation of every single word you use.
There's a real engineering problem underneath all of this. We can't ask designers to babysit every AI output — refining step by step like a film director. That's too slow; it defeats the purpose entirely. But we also can't accept AI delivering its daily interpretation of a style keyword. That doesn't scale, and it doesn't survive team collaboration.
What we actually need: whoever triggers the AI, whenever they trigger it, the output reflects this team's professional standard — not AI's current read on a style word.
The answer isn't finding more precise vocabulary. It's translating your aesthetic judgment into constraints AI simply cannot misread. And that requires working across seven dimensions at once.
Design Tokens — The DNA of Taste
Here's a concrete example of how this translation works.
Say your product needs a visual feel that's "restrained, professional, but not cold." Any designer gets this immediately. AI doesn't. "Restrained" — how restrained? "Professional" — which industry? These words are starting points for human understanding, not specifications.
You have to translate that feeling into parameters:
Accent color: CTAs and key states only · never exceeds 10% of page area
Card padding: 24px (consistent throughout)
Related elements: 8px gap · Unrelated elements: 24px+
Line height: 1.6
Border radius: --radius-md (8px), not hard corners
Once you do this, something important happens: your aesthetic judgment becomes a constraint AI literally cannot misinterpret. And the same logic applies to any style vocabulary. "Glassmorphism" stops being a word in your system and becomes a specification:
Backdrop blur: 16px
Border: 1px solid rgba(255, 255, 255, 0.2)
Shadow: 0 4px 24px rgba(0, 0, 0, 0.08)
Usage: floating cards and modals only
This is what Design Tokens are really for — not just as an organizational tool for design systems, but as the parametric encoding of taste.
Interaction States — Making Components Actually Alive
Visual consistency solved, the next problem surfaces quickly: AI-generated components are frozen.
You see a beautiful button. But it only exists in that one moment. What happens when a user hovers? What state does it enter on form submit? How does it communicate a failure? AI doesn't know — because you didn't tell it.
My rule: any component must define its complete lifecycle at the time it's created.
Base states: Default, Hover, Active, Disabled, Focus.
Business states: Loading, Error, Empty.
But in agentic system design, there's a whole additional category that traditional specs don't account for at all:
Waiting for Human Agent paused, awaiting approval
Agent Failed Human intervention required
Completed by Agent Output ready for human review
These aren't just visual decisions. They're the moments where users decide whether to trust the system. A poorly designed "Waiting for Human" state can silently stall an entire workflow — the user doesn't know the system needs them, the agent doesn't know where the user went.
The biggest gap between production-grade design and a sketch is whether components have actually been activated.
Spacing System — Rhythm Is Not "Close Enough"
Here's something AI does that quietly drives designers crazy: it's very good at approximately right.
14px and 16px look close enough. 20px and 24px, same thing. But to a user staring at this interface for eight hours a day, that accumulated approximation turns into a vague visual fatigue they can't name — and can't stop feeling.
The fix is simple: give AI a sequence it cannot deviate from.
Any value not in this sequence is wrong. Regenerate.
This isn't constraining creativity — it's establishing rhythm. Music needs a beat; page layout needs internal cadence. With this constraint in place, AI-generated pages naturally develop the kind of breathing room you see in professional work. Not because AI got smarter, but because it no longer has the opportunity to make this particular mistake.
Typography Scale — Don't Let AI Decide What Matters
Typography is the part AI most reliably scrambles without guardrails.
Left unconstrained, it mixes sizes freely — 15px here, 17px somewhere else, 22px wherever. Each number is defensible in isolation. Together, they're a mess. No hierarchy, no emphasis, nothing to anchor the eye.
The deeper issue: type hierarchy determines which information is more important. When you let AI choose font sizes freely, you're letting AI decide your information priorities. That judgment isn't one to outsource.
H1 36px · weight 700 · leading 1.20
H2 28px · weight 600 · leading 1.30
H3 22px · weight 600 · leading 1.40
Body 16px · weight 400 · leading 1.80
Small 14px · weight 400 · leading 1.60
Caption 12px · weight 400 · leading 1.50
AI selects from the scale. It doesn't improvise. And don't only define size — an H1 at line-height 1.2 feels completely different from the same H1 at 1.6. Leave these undefined and you get something different every single time.
Content Guidelines — UX Writing Is Part of the System
With the visual layer mostly handled, there's one more dimension that's easy to overlook: copy.
AI is genuinely good at generating text. The problem is that the text it generates tends to be too safe — polite, vague, low in action orientation. "Please click here to proceed with your operation" should not exist in any real product.
✓ "Save Changes" ✗ "Click to save your changes"
✓ "Password must be 8+ characters with 1 number"
✗ "Invalid password format"
In agentic systems, copy faces a new challenge: how do you explain what the AI is doing right now? The balance to find is between transparent enough to build trust and technical enough to lose everyone. My principle: write it the way you'd explain it to a smart colleague who doesn't know the system.
Standardized Prompts — Making Specs Actually Execute
The first five dimensions build the spec. But specs are documents. Documents don't enforce themselves.
Here's the failure mode to avoid: even if you've defined every token, state, spacing rule, type scale, and content guideline — if those constraints aren't brought into each AI request, AI reverts to its own interpretation. Every time.
The fix is a standardized prompt template. I treat it like a contract: every generation request automatically includes it.
Components must include all interaction states.
Copy follows the Content Guidelines.
When in doubt: ask first, don't invent.
Without this template, the spec is a wish written in a document. With it, the spec becomes actual AI behavior.
Design QA Agent — From Visual Audit to Behavioral Audit
The last checkpoint — and the most commonly misunderstood one.
Most people think of QA as a manual checklist: designer sits down, goes through colors, spacing, fonts, item by item. In an AI-first workflow, this has a fatal flaw: it uses the most valuable human judgment on the least judgment-intensive work.
I split QA into two layers.
Layer 1: Design QA Agent (automated)
Everything with a right answer goes to the agent:
├── Is this spacing a multiple of the 8px grid?
├── Is this font size on the type scale?
└── Does this button label exceed three words?
Structural completeness ├── Does every component define all required states?
├── Are Hover and Error states present?
├── Are agentic states (Processing / Waiting / Failed) covered?
└── Are responsive breakpoints handled?
The agent runs these checks faster and more accurately than any human, and won't miss anything. Non-compliant outputs get kicked back for regeneration before they ever reach a person.
Layer 2: Human Review (judgment)
Only outputs that pass Layer 1 reach this stage. At this point, the human isn't checking colors and spacing — the agent already handled that. Human review is reserved for what agents genuinely cannot do:
├── Is the visual feedback for this state clear enough?
└── Is there something subtly off that a user would notice?
Intent verification ├── Is this actually what I wanted?
├── Does this interaction match user mental models?
└── Does this copy say what it's supposed to?
Agentic behavior audit ├── What should the agent do in this state — and did it?
├── Did it do anything it shouldn't have?
├── Are human-in-the-loop intervention points obvious?
└── If the agent fails, does the user know what to do?
Together, these two layers upgrade QA from "manual checklist" to a real quality system. Without Layer 1, human attention gets consumed by routine checks and real judgment never gets used. Without Layer 2, no one is guarding design intent or taste.