AI Reference
How To Build And Design Skills
This page is a practical playbook for building reliable AI skills that trigger correctly, run predictable workflows, and stay maintainable as your team scales.
Last updated: 2026-03-11
1. Start with 2-3 concrete outcomes
What to do
Define what the user wants to accomplish in plain language before writing the skill file. Keep scope narrow at first.
Why it matters
A focused scope makes triggering cleaner, instructions shorter, and testing easier. Broad skills often underperform because intent is unclear.
Example
Good: "Set up sprint tasks in Linear from a project brief." Bad: "Help with project management."
Common mistake: Trying to solve every workflow in v1.
2. Write description text for real user phrasing
What to do
In frontmatter, state what the skill does, when to use it, and include realistic trigger phrases users would actually type.
Why it matters
Triggering quality is mostly determined by description clarity. If it is vague, the skill is either ignored or over-triggered.
Example
Use phrases like "create tickets", "plan sprint", "generate handoff" instead of generic text like "helps with tasks."
Common mistake: Using internal architecture jargon instead of user language.
3. Use a strict, predictable skill structure
What to do
Keep folder naming in kebab-case, keep SKILL.md exact, and split large details into references/ and tooling into scripts/.
Why it matters
Predictable structure reduces upload/config failures and improves maintainability when teams scale skill count.
Example
Use customer-onboarding-skill/SKILL.md with references/api-errors.md and scripts/validate_input.py.
Common mistake: Scattering instructions across random files or misnaming SKILL.md.
4. Make every step explicit and executable
What to do
Write instructions as ordered steps with clear inputs, expected output, and completion checks for each phase.
Why it matters
Ambiguous steps create inconsistent behavior. Explicit steps produce stable outcomes across sessions and users.
Example
Step: "Call create_subscription with plan_id and customer_id, then verify subscription status is active."
Common mistake: Using vague instructions like "handle payment setup properly."
5. Validate before side effects
What to do
Add preflight checks before create/update/delete actions. Prefer deterministic script-based checks for critical rules.
Why it matters
Validation gates prevent invalid writes and reduce expensive retries in production systems.
Example
Before creating a deployment ticket: verify service name exists, owner is assigned, and deadline is not in the past.
Common mistake: Only validating after failures happen.
6. Include error handling and recovery paths
What to do
Document common failure modes, likely causes, and exact remediation steps users or the skill should take.
Why it matters
Most real-world breakage comes from auth, connectivity, or data shape issues. Recovery guidance keeps workflows usable.
Example
If API returns 401: refresh token, retry once, then ask user to reconnect integration if still unauthorized.
Common mistake: Assuming happy-path infrastructure with no retries or fallback.
7. Keep core instructions short and defer deep docs
What to do
Put only decision-critical instructions in SKILL.md; link detailed specs, error catalogs, and templates in references/.
Why it matters
Smaller core instructions reduce context load and keep model attention on the active workflow.
Example
In SKILL.md: "For pagination edge cases, consult references/pagination-playbook.md."
Common mistake: Dumping full API manuals inside the main skill file.
8. Test triggering, function, and performance separately
What to do
Run tests in three buckets: trigger accuracy, output correctness, and baseline-vs-skill efficiency.
Why it matters
A skill can produce correct outputs but still fail in production if it triggers poorly or wastes tokens/tool calls.
Example
Track: trigger hit-rate on paraphrases, failed calls per run, and total tool calls compared to no-skill baseline.
Common mistake: Only doing one manual demo and calling it done.
9. Choose one orchestration pattern per workflow
What to do
Pick an explicit pattern (sequential steps, multi-service phases, refinement loop, or context-based routing) for each use case.
Why it matters
Pattern clarity prevents instruction conflicts and makes troubleshooting much faster.
Example
Design handoff: Figma export -> Drive upload -> Linear task creation -> Slack notification.
Common mistake: Mixing multiple patterns in one unstructured instruction block.
10. Treat skills as living assets with versioned iteration
What to do
Use under-triggering, over-triggering, and correction frequency as feedback signals. Update description/instructions and version metadata.
Why it matters
Most skills improve through operational feedback, not first-draft perfection.
Example
If users keep manually enabling a skill, add missing trigger terms and clarify scope boundaries.
Common mistake: Never revisiting frontmatter after launch.
Starter Template
Use this minimal template as a clean starting point, then iterate based on real workflow feedback.
---
name: customer-onboarding
description: End-to-end onboarding for new e-commerce customers. Use when user says "onboard customer", "create subscription", or "set up billing".
metadata:
version: 1.0.0
category: onboarding
---
# Customer Onboarding
## Step 1: Validate Inputs
- Confirm customer name, email, and plan are present.
- Stop and ask for missing fields before any write.
## Step 2: Create Customer
- Call create_customer.
- Verify returned customer_id is non-empty.
## Step 3: Create Subscription
- Call create_subscription with customer_id and plan_id.
- Confirm status is active.
## Common Issues
- If auth fails: refresh credentials, retry once, then ask user to reconnect integration.
Validation Checklist
- 1. Scope is limited to 2-3 concrete workflows.
- 2. Frontmatter description includes both WHAT and WHEN.
- 3. Trigger phrases reflect real user wording, not internal jargon.
- 4. Instructions are step-based with validation gates before side effects.
- 5. Common errors are documented with exact fixes.
- 6.
references/is used for deep docs instead of overloadingSKILL.md. - 7. Triggering tests include paraphrases and explicit non-trigger cases.
- 8. Functional tests verify outputs, tool call success, and edge-case handling.
- 9. Performance is compared against a baseline workflow without the skill.
- 10. Metadata version is updated after meaningful behavior changes.
Skill Marketplaces and References
Use these resources to study current skill patterns, discover publishable formats, and benchmark how other teams package reusable skill workflows.
skills.sh
Discover reusable skills, study real implementations, and publish your own skill workflows.
MCP Market
Browse MCP servers and skill listings to learn patterns and distribution approaches.
AI Templates
Explore practical AI templates to adapt into your own skill design and packaging style.
lube AI Skills + Playbooks
Study lube’s published AI skills, prompts, and reusable playbooks you can adapt and share.
Prompt Engineering Guide
Reference prompting techniques for writing more reliable skill instructions and safeguards.
Full Claude Guide
For deeper context and latest official details, review the full Claude guide on building and structuring skills.