Your AI Agent Doesn’t Need to Research. It Needs to Know.

Dale Seo
Agent Skills reduced our AI agent’s token consumption from 65,000 to 24,000 tokens and completion time from five minutes to under two — for the exact same task with the exact same agent. Here’s what happened, why it matters, and what it means for how you build with AI agents.
There’s a recurring question in the AI community and engineering circles right now: with frontier models becoming more capable every quarter, do AI agents actually need curated Skills and instructions? Or will raw model intelligence eventually make them unnecessary?
We decided to test it. We ran a controlled comparison and the data speaks for itself.
The experiment
Apollo MCP Server gives AI agents a secure way to access any GraphQL API. Instead of writing custom integration code, agents explore the schema, build valid operations, and fetch data through a single MCP interface. It handles schema discovery efficiently so agents don’t need the entire schema in context, just the parts relevant to the current task.
We used this as our test case: ask Claude Code to set up an Apollo MCP Server connected to a public GraphQL API, then use it to fetch country data. Same model, same starting conditions, same goal. The only variable was whether the agent had access to Apollo Skills.
First, here’s the agent working without Skills:
And here’s the same task with Apollo Skills installed:
The results:
| Without Skills | With Skills | |
|---|---|---|
| Time to completion | 5+ minutes | < 2 minutes |
| Token consumption | ~65,000 | ~24,000 |
Both runs produced the same correct result. The difference was the path to get there.
Why the gap exists
Watch the two recordings side by side and the pattern is obvious. Without Skills, the agent spends most of its time researching: searching the web, fetching documentation, reading through pages of results, extracting relevant pieces, then attempting an approach. Often incorrectly on the first try. It backtracks, adjusts, tries again. It gets there, but the route is wasteful.
Think of it as off-roading versus a highway. Without Skills, many paths look plausible, but some lead to dead ends, outdated patterns, or unnecessary detours. Each wrong turn burns tokens, and the agent has no way to know the path was wrong until it gets there.
Skills pave the road. The agent knows what knowledge is available and when to load it. No researching. No guessing. No trial and error. When the right Skill activates, the agent reads it, understands the correct approach, and executes. Same destination, far less fuel.
That 65k-to-24k drop is a 63% reduction for the same outcome. For teams running agents at scale, that translates directly to cost savings and faster iteration cycles.
This wasn’t a one-off. We’ve been running comparisons for months since we launched Apollo Skills, and a few patterns keep showing up.
Skill quality matters more than quantity
Bad Skills can do more harm than good. A poorly written Skill introduces noise into the agent’s context window (wrong patterns, outdated syntax, misleading instructions) and the agent follows them dutifully. You end up worse off than if you’d given it nothing.
This matters even more when you consider what models already “know.” The Apollo Client knowledge baked into current models is often outdated or incorrect. Models are confidently wrong without correction. A well-maintained Skill is an authoritative, up-to-date source of truth that takes precedence over whatever the model learned during pre-training.

The download numbers on skills.sh tell this story clearly. Our rust-best-practices Skill has 5,000+ downloads, the most-downloaded Rust Skill on the platform, despite being published after several alternatives. The earlier ones were low quality. Engineers tried them, got worse results, and moved on. The same pattern holds across our GraphQL Skills: apollo-client at 1.7K, graphql-schema at 1.1K, graphql-operations at 963, and apollo-mcp-server at 895. Quality wins.
The implication: treat Skills like production code. Review them. Test them. Hold them to a standard.
Skills need a maintenance cycle
A Skill that helps today might hurt six months from now. Models train on increasingly recent data and reason better with each generation. The gap a Skill was designed to bridge can shrink or disappear.
Not all Skills age the same way, though. Some compensate for things a model gets wrong today, like outdated API patterns or incorrect library usage. These have the shortest lifespan because a future model may handle them natively.
Others encode team conventions (coding standards, naming patterns, response formatting) that no model will learn from public training data. These last longer but still need review as your own standards evolve.
The point is to treat Skills as something you maintain, not something you write once and forget:
- Test with and without. When a new model drops, re-run your tasks with the Skill disabled. If the model produces the same quality at comparable cost and speed, the Skill has done its job. Consider retiring it.
- Update when the product changes. If your API ships a new auth flow or deprecates an endpoint, the Skill needs to reflect that. Stale Skills produce stale code.
- Keep evals running. Even after retiring a Skill, keep validating that the model handles the underlying task correctly. This catches regressions before they reach production.
We’re building this into how Apollo Skills work. Each Skill targets the latest stable version of its product, and the team that owns the product owns the accuracy of the Skill.
Those teams don’t manually watch for drift, though. AI-powered sync pipelines detect when product changes affect a Skill’s content. An LLM triages the diff, determines whether the existing guidance is now stale, and if so, generates an update as a pull request. Product teams review and approve rather than writing updates themselves.
Skills stay in sync with the products they describe without becoming a maintenance burden. The guidance agents receive reflects the actual state of the tools, not a snapshot from months ago.
Skills shine in constrained environments
Not everyone runs the latest frontier model with unlimited tokens. Many teams use local LLMs, self-hosted models, or lower-tier API plans for cost, privacy, or compliance reasons. CI/CD pipelines add another constraint: automated agents running in CI often operate with smaller models and stricter token budgets to keep build costs predictable. These environments need more guidance, not less.
Skills close the gap. They give a constrained agent access to the same curated knowledge that a larger model might have partially internalized through training. A smaller model with the right Skill can match a larger model without one, producing correct results faster and with fewer tokens.
The CI case is worth calling out specifically. When an agent runs in a pipeline (generating code, reviewing pull requests, updating docs) every minute it spends researching is a minute a developer waits for feedback. If the agent takes five minutes instead of two, that delay compounds across every PR, every build, every developer on the team. Skills cut that wait time. Faster agents mean faster feedback loops.
Whether it’s a smaller model, a token-limited CI job, or self-hosted infrastructure, Skills can be the difference between an agent that blocks your workflow and one that speeds it up.
What this means for your team
Both runs in our experiment produced correct results. The agent is capable without Skills. But are you willing to pay the cost of letting it figure things out from scratch every time?
For a single ad-hoc task, maybe the difference doesn’t matter. For teams running agents across dozens of workflows daily, fewer tokens and less time per task adds up. The exact savings vary by Skill and model, but the direction is consistent: Skills cut tokens and time. Multiply that across an engineering org and the return becomes hard to ignore.
There’s also the repetition problem. Without Skills, developers paste the same instructions into every new agent session: “use v4 of this API, not v3,” “follow this naming convention,” “don’t use the deprecated auth flow.” Every session starts from zero, and the developer becomes responsible for catching the agent’s mistakes.
Skills encode that guidance once. The agent picks it up automatically, session after session, without the developer repeating themselves.
The less capable the model or the tighter the token budget, the more Skills matter. They put the right knowledge in context so the agent doesn’t have to go find it.

Apollo Skills are now marked as Official Skills on skills.sh, meaning they’re published and maintained by the team that builds the technology. To get started:
npx skills add apollographql/skillsMore on how Apollo Skills work and what Apollo contributes: Apollo Skills: Teaching AI agents how to use Apollo and GraphQL.