Instruction Budget: The Ceiling Moved

Skills are no longer a compression problem. They’re a verification problem.
~200
Old rule of thumb (2025): instructions start dropping
~2,000+
Frontier (2026): named constraints tracked in one prompt
The question shifted from “can it follow this?” to “is it worth the tokens, and how do we prove it complied?”
  • Stop over-optimizing for “short prompt.” Optimize for structure + checklists.
  • Invest your time in verification: compliance checks + audit logs.
  • Length is now a cost/latency tradeoff, not a hard capability cliff.
Quick token budgeting proxy
# Count tokens as a consistent budget proxy (not exact for every vendor) python3 -m pip install --user tiktoken python3 - <<'PY' import sys, tiktoken enc = tiktoken.get_encoding('o200k_base') print(len(enc.encode(sys.stdin.read()))) PY < SKILL.md
Use this to compare skills/policies relative to each other. The goal is consistent budgeting, not perfect vendor token parity.
Source idea: IFScale-style instruction-following benchmarks + recent 2026 replications (Laurie Voss summary)