Protocol Revision: draft
User Interaction Model
Sampling in MCP allows servers to implement agentic behaviors, by enabling LLM calls to occur nested inside other MCP server features. Implementations are free to expose sampling through any interface pattern that suits their needs—the protocol itself does not mandate any specific user interaction model.Tools in Sampling
Servers can request that the client’s LLM use tools during sampling by providing atools array and optional toolChoice configuration in their sampling requests. This enables servers to implement agentic behaviors where the LLM can call tools, receive results, and continue the conversation - all within a single sampling request flow.
Clients MUST declare support for tool use via the sampling.tools capability to receive tool-enabled sampling requests. Servers MUST NOT send tool-enabled sampling requests to Clients that have not declared support for tool use via the sampling.tools capability.
Capabilities
Clients that support sampling MUST declare thesampling capability during
initialization:
Basic sampling:
The
includeContext parameter values "thisServer" and "allServers" are
soft-deprecated. Servers SHOULD avoid using these values (e.g. can just
omit includeContext since it defaults to "none"), and SHOULD NOT use
them unless the client declares sampling.context capability. These values
may be removed in future spec releases.Protocol Messages
Creating Messages
To request a language model generation, servers send asampling/createMessage request:
Request:
Sampling with Tools
The following diagram illustrates the complete flow of sampling with tools, including the multi-turn tool loop: To request LLM generation with tool use capabilities, servers includetools and optionally toolChoice in the request:
Request (Server -> Client):
Multi-turn Tool Loop
After receiving tool use requests from the LLM, the server typically:- Executes the requested tool uses.
- Sends a new sampling request with the tool results appended
- Receives the LLM’s response (which might contain new tool uses)
- Repeats as many times as needed (server might cap the maximum number of iterations, and e.g. pass
toolChoice: {mode: "none"}on the last iteration to force a final result)
Message Content Constraints
Tool Result Messages
When a user message contains tool results (type: “tool_result”), it MUST contain ONLY tool results. Mixing tool results with other content types (text, image, audio) in the same message is not allowed. This constraint ensures compatibility with provider APIs that use dedicated roles for tool results (e.g., OpenAI’s “tool” role, Gemini’s “function” role). Valid - single tool result:Tool Use and Result Balance
When using tool use in sampling, every assistant message containingToolUseContent blocks MUST be followed by a user message that consists entirely of ToolResultContent blocks, with each tool use (e.g. with id: $id) matched by a corresponding tool result (with toolUseId: $id), before any other message.
This requirement ensures:
- Tool uses are always resolved before the conversation continues
- Provider APIs can concurrently process multiple tool uses and fetch their results in parallel
- The conversation maintains a consistent request-response pattern
- User message: “What’s the weather like in Paris and London?”
- Assistant message:
ToolUseContent(id: "call_abc123", name: "get_weather", input: {city: "Paris"}) +ToolUseContent(id: "call_def456", name: "get_weather", input: {city: "London"}) - User message:
ToolResultContent(toolUseId: "call_abc123", content: "18°C, partly cloudy") +ToolResultContent(toolUseId: "call_def456", content: "15°C, rainy") - Assistant message: Text response comparing the weather in both cities
- User message: “What’s the weather like in Paris and London?”
- Assistant message:
ToolUseContent(id: "call_abc123", name: "get_weather", input: {city: "Paris"}) +ToolUseContent(id: "call_def456", name: "get_weather", input: {city: "London"}) - User message:
ToolResultContent(toolUseId: "call_abc123", content: "18°C, partly cloudy") ← Missing result for call_def456 - Assistant message: Text response (invalid - not all tool uses were resolved)
Cross-API Compatibility
The sampling specification is designed to work across multiple LLM provider APIs (Claude, OpenAI, Gemini, etc.). Key design decisions for compatibility:Message Roles
MCP uses two roles: “user” and “assistant”. Tool use requests are sent in CreateMessageResult with the “assistant” role. Tool results are sent back in messages with the “user” role. Messages with tool results cannot contain other kinds of content.Tool Choice Modes
CreateMessageRequest.params.toolChoice controls the tool use ability of the model:
{mode: "auto"}: Model decides whether to use tools (default){mode: "required"}: Model MUST use at least one tool before completing{mode: "none"}: Model MUST NOT use any tools
Parallel Tool Use
MCP allows models to make multiple tool use requests in parallel (returning an array ofToolUseContent). All major provider APIs support this:
- Claude: Supports parallel tool use natively
- OpenAI: Supports parallel tool calls (can be disabled with
parallel_tool_calls: false) - Gemini: Supports parallel function calls natively
Message Flow
Data Types
Messages
Sampling messages can contain:Text Content
Image Content
Audio Content
Model Preferences
Model selection in MCP requires careful abstraction since servers and clients may use different AI providers with distinct model offerings. A server cannot simply request a specific model by name since the client may not have access to that exact model or may prefer to use a different provider’s equivalent model. To solve this, MCP implements a preference system that combines abstract capability priorities with optional model hints:Capability Priorities
Servers express their needs through three normalized priority values (0-1):costPriority: How important is minimizing costs? Higher values prefer cheaper models.speedPriority: How important is low latency? Higher values prefer faster models.intelligencePriority: How important are advanced capabilities? Higher values prefer more capable models.
Model Hints
While priorities help select models based on characteristics,hints allow servers to
suggest specific models or model families:
- Hints are treated as substrings that can match model names flexibly
- Multiple hints are evaluated in order of preference
- Clients MAY map hints to equivalent models from different providers
- Hints are advisory—clients make final model selection
gemini-1.5-pro based on similar capabilities.
Error Handling
Clients SHOULD return errors for common failure cases:- User rejected sampling request:
-1 - Tool result missing in request:
-32602(Invalid params) - Tool results mixed with other content:
-32602(Invalid params)
Security Considerations
- Clients SHOULD implement user approval controls
- Both parties SHOULD validate message content
- Clients SHOULD respect model preference hints
- Clients SHOULD implement rate limiting
- Both parties MUST handle sensitive data appropriately
- Servers MUST ensure that when replying to a
stopReason: "toolUse", eachToolUseContentitem is responded to with aToolResultContentitem with a matchingtoolUseId, and that the user message contains only tool results (no other content types) - Both parties SHOULD implement iteration limits for tool loops