Skill Conductor: полный lifecycle скиллов для Claude Code
Claude Code использует скиллы, но кто их создаёт? Skill Conductor превращает Claude Code в самоулучшающуюся систему — agent, который пишет скиллы для других агентов.
META: агент, который учит агентов
Skill Conductor — это meta-skill. Вместо выполнения пользовательских задач он создаёт, тестирует, оценивает и упаковывает скиллы для других агентов.
Full lifecycle: draft → test → review → improve → repeat
6 режимов работы:
- CREATE — новый скилл с нуля
- IMPROVE — фиксим существующий скилл
- VALIDATE — тестируем quality & triggering
- REVIEW — quality gate для third-party скиллов
- OPTIMIZE — автоматическая оптимизация description
- PACKAGE — упаковка для distribution
Mode 1: CREATE — TDD для скиллов
Test-Driven Development подход: сначала fail, потом fix.
Step 1: Capture Intent
What specific task should this skill handle?
What would a user say to trigger it?
What should NOT trigger it?Пример:
- Task: "Extract data from design files for developer handoff"
- Trigger: "analyze this Figma file", "need design specs"
- NOT trigger: Sketch files, Adobe XD, general design feedback
Step 2: Baseline (TDD RED)
Verify agent fails WITHOUT the skill:
- Take one scenario
- Run in clean session
- Document failure
Если агент уже справляется — скилл не нужен.
Step 3: Architecture
Выбрать паттерн из references/patterns.md:
Pattern: Sequential workflow → Use when: clear step-by-step process
Pattern: Iterative refinement → Use when: output improves with cycles
Pattern: Context-aware selection → Use when: same goal, different tools by context
Pattern: Domain intelligence → Use when: specialized knowledge beyond tool access
Pattern: Multi-MCP coordination → Use when: workflow spans multiple services
Step 4: Scaffold
uv run scripts/init_skill.py <skill-name> --path <output-dir> [--resources scripts,references,assets]Структура:
skill-name/
├── SKILL.md # required — the brain
├── scripts/ # deterministic operations (executed, not loaded)
├── references/ # detailed docs (loaded on demand)
└── assets/ # templates, images (never loaded)SKILL.md: мозг скилла
Frontmatter — критично важен:
---
name: kebab-case-name
description: >
[Purpose in one sentence]. Use when [triggers].
Do NOT use for [negative triggers].
---GOLDEN RULE: description определяет triggering. Без правильного description скилл никогда не сработает.
Good description:
# ✅ Purpose + triggers, no process
description: Analyze Figma design files for developer handoff. Use when user uploads .fig files or asks for "design specs". Do NOT use for Sketch or Adobe XD.Bad description:
# ❌ Process in description (agent skips body)
description: Exports Figma assets, generates specs, creates Linear tasks, posts to Slack.Body structure:
# Skill Name
## Overview
What this enables. 1-2 sentences.
## [Main sections]
Step-by-step with numbered sequences.
Concrete templates over prose.
Imperative voice throughout.
## Common Mistakes
What goes wrong + how to fix.Writing rules:
- One term per concept** — "template", not template/boilerplate/scaffold
- Progressive disclosure** — SKILL.md brain (<500 lines), references = details
- Token budget** — frequently loaded: <200 words, standard: <500 lines
- Imperative voice** — "Extract the data", not "You should extract"
Eval Loop: test-driven improvement
Step 6: Test Cases
Create evals/evals.json:
- 3–5 eval prompts covering core use cases
- Define expectations (verifiable statements)
- Start without assertions — observe first, then write
Eval execution:
- Spawn executor subagent with skill active
- Spawn baseline run (same prompt, no skill)
- Grade outputs using agents/grader.md
- Launch eval viewer: uv run eval-viewer/generate_review.py <workspace>
Viewer показывает:
- Side-by-side comparison (with skill vs without)
- Timing data, token usage
- Pass/fail assertions
- Quality scores
Mode 2: IMPROVE — диагностика проблем
Problem classification:
Problem: Undertriggering → Signal: skill doesn't load → Fix: add keywords to description
Problem: Overtriggering → Signal: loads for unrelated queries → Fix: add negative triggers
Problem: Skips body → Signal: follows description only → Fix: remove process from description
Problem: Inconsistent output → Signal: varies across sessions → Fix: add explicit templates, reduce freedom
Problem: Too slow → Signal: large context → Fix: move detail to references/
Improvement mindset:
- Generalize from feedback** — don't overfit to test cases
- Keep prompt lean** — remove unproductive steps
- Explain the why** — LLMs have good theory of mind
- Bundle repeated work** — same helper script = move to scripts/
Blind Comparison (for major changes):
- Run both versions on same evals
- Spawn agents/comparator.md — gets outputs A/B without knowing which is which
- Comparator scores + picks winner
- Spawn agents/analyzer.md — analyzes WHY winner won
Prevents bias в оценке.
Mode 3: VALIDATE — 3-stage quality check
Stage 1: Structural Validation
uv run scripts/eval_skill.py <skill-folder>Checks: frontmatter, naming, description, body size. Target: 10/10.
Stage 2: Discovery (trigger testing)
Generate 6 prompts:
- 3 SHOULD trigger
- 3 should NOT (similar-sounding but wrong domain)
Run in clean sessions. Target: 6/6 correct.
Stage 3: 5-Axis Scoring (1-10 each):
Axis: Discovery → What it measures: triggers correctly, no false triggers
Axis: Clarity → What it measures: instructions unambiguous
Axis: Efficiency → What it measures: token budget respected
Axis: Robustness → What it measures: handles edge cases
Axis: Completeness → What it measures: covers stated use cases
Scoring: 45–50 production ready · 35–44 solid · 25–34 needs work · <25 rewrite
Mode 5: OPTIMIZE — automated description tuning
Description competes с другими скиллами за attention Claude. Optimization находит wording с best triggering accuracy.
How it works:
- Create eval set: 20 queries (10 should-trigger, 10 should-not)
- Train/test split (60%/40%) to prevent overfitting
- Optimization loop:
- Evaluate current description
- Claude proposes improvement (sees only train data)
- Re-evaluate
- Select best by test score
Writing good eval queries:
# ❌ Too simple - Claude handles without skills
"Format this data"
# ✅ Realistic, substantive
"my boss sent Q4 sales final FINAL v2.xlsx, add profit margin % column, revenue is col C costs col D"Should-NOT queries: near-misses с shared keywords but need different skill. "Write fibonacci" as negative для PDF skill = useless — too easy.
Run optimization:
uv run scripts/run_loop.py \
--eval-set evals/eval_set.json \
--skill-path <skill-dir> \
--model claude-sonnet-4-20250514 \
--max-iterations 5 \
--holdout 0.4Skill Categories & Patterns
3 типа скиллов:
- Document/Asset Creation — consistent output (docs, designs, code)
- Workflow Automation — multi-step processes с methodology
- MCP Enhancement — workflow guidance поверх tool access
Progressive disclosure budget:
Level: Frontmatter → When loaded: always (system prompt) → Budget: ~100 words
Level: SKILL.md body → When loaded: on trigger → Budget: <500 lines
Level: Bundled resources → When loaded: on demand → Budget: unlimited
File purposes:
Directory: SKILL.md → Loaded?: on trigger → Purpose: brain — instructions
Directory: references/ → Loaded?: on demand → Purpose: detailed docs, schemas
Directory: scripts/ → Loaded?: executed, not loaded → Purpose: deterministic operations
Directory: assets/ → Loaded?: never loaded → Purpose: templates, images
Mode 6: PACKAGE — distribution ready
Quality gate:
- Run REVIEW checklist (11 points)
- Validate: uv run scripts/quick_validate.py <skill-folder>
- Package: uv run scripts/package_skill.py <skill-folder>
Creates: skill-name.skill (zip with .skill extension)
Checklist items:
- SKILL.md exists, exact case
- Valid YAML frontmatter
- description ≤1024 chars, no process steps
- No README inside skill folder
- SKILL.md <500 lines
- Scripts tested and executable
- No hardcoded secrets
Real Example: Create Mode
User request: "Create a skill for analyzing security audit reports"
Step 1: Intent capture
- Task: Extract vulnerabilities, classify severity, generate executive summary
- Trigger: "analyze this security report", uploads PDF/DOCX
- NOT trigger: code review, compliance reports
Step 2: Baseline
Run without skill → agent tries basic file reading, misses structured extraction
Step 3: Architecture
Sequential workflow: parse → extract → classify → summarize
Step 4: Scaffold
uv run scripts/init_skill.py security-audit-analyzer --resources scripts,referencesStep 5: Write SKILL.md
---
name: security-audit-analyzer
description: >
Extract and analyze vulnerabilities from security audit reports.
Use when user uploads security assessment PDFs/DOCX or asks to
"analyze security report". Do NOT use for code reviews or
compliance documents.
---
# Security Audit Analyzer
## Overview
Extracts vulnerabilities, classifies by CVSS severity, generates executive summary.
## Workflow
1. **Parse document** - Extract text, identify sections
2. **Extract findings** - Pull vulnerability details, evidence
3. **Classify severity** - Apply CVSS scoring if not present
4. **Generate summary** - Executive overview with risk metricsStep 6: Test & iterate
Create evals → run → review in viewer → fix issues → repeat
Meta-Insights
Skill Conductor teaches нас fundamentals о том, как работают agent skills:
Description is everything — без правильного triggering description skill мёртвый код
Progressive disclosure — Claude читает minimum necessary для task completion
Test-driven development — always verify failure first, then fix
Blind evaluation — bias affects даже AI judgment, automation prevents это
Token economics — every loaded character конкурирует за attention
Iteration beats perfection — small improvements с tight feedback loop > big rewrite
Заключение
Skill Conductor — это не просто tool для создания скиллов. Это meta-framework, который делает Claude Code self-improving system.
Что делает его unique:
- Complete lifecycle** — от идеи до production package
- Test-driven approach** — всё starts с failing test
- Automated optimization** — machine learning для trigger accuracy
- Quality gates** — structural + behavioral validation
- Blind evaluation** — unbiased A/B comparison
GitHub: https://github.com/smixs/skill-conductor
Результат: Claude Code, который пишет скиллы для других Claude Code. Meta-programming для AI agents.
*Skill development: когда AI учит AI быть лучше.* 🧪
> Пока нет комментариев