Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows
Summary
This paper addresses the 'Tools Tax' - the overhead of injecting all tool schemas into LLM prompts on every turn in Model Context Protocol (MCP) systems. The authors propose Tool Attention, which dynamically selects relevant tools using semantic similarity scores, stateful gating functions, and lazy schema loading. Their approach reduces tool-related tokens by 95% (47k to 2.4k tokens) while maintaining task success rates. The mechanism works by keeping compact tool summaries always in context but only loading full schemas for the top-k most relevant tools per turn, combined with a hallucination gate to handle false negatives.
Key findings
- The Tools Tax scales linearly with catalog size and dominates effective context window past ~50 tools, causing reasoning degradation
- Tool Attention achieves 95% reduction in tool tokens per turn while improving projected task success by 22 percentage points
- Two-phase lazy loading (summaries always present, full schemas on-demand) preserves tool discoverability while minimizing context overhead
- Semantic gating based on Intent-Schema Overlap scores provides both efficiency gains and defensive benefits against tool poisoning attacks
How to implement
- Implement Tool Attention middleware in enterprise chatbots that integrate 50+ internal APIs to reduce per-session costs from $55 to $8 while improving response accuracy
- Deploy in code assistants like GitHub Copilot that access multiple development tools (Git, databases, CI/CD) to prevent context window exhaustion during long coding sessions
- Integrate into customer service agents with access to CRM, ticketing, knowledge base, and communication tools to maintain reasoning quality across extended support conversations