Let your servers request completions from LLMs
Sampling is a powerful MCP feature that allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security and privacy.
This feature of MCP is not yet supported in the Claude Desktop client.
The sampling flow follows these steps:
sampling/createMessage
request to the clientThis human-in-the-loop design ensures users maintain control over what the LLM sees and generates.
Sampling requests use a standardized message format:
The messages
array contains the conversation history to send to the LLM. Each message has:
role
: Either “user” or “assistant”content
: The message content, which can be:
text
fielddata
(base64) and mimeType
fieldsThe modelPreferences
object allows servers to specify their model selection preferences:
hints
: Array of model name suggestions that clients can use to select an appropriate model:
name
: String that can match full or partial model names (e.g. “claude-3”, “sonnet”)Priority values (0-1 normalized):
costPriority
: Importance of minimizing costsspeedPriority
: Importance of low latency responseintelligencePriority
: Importance of advanced model capabilitiesClients make the final model selection based on these preferences and their available models.
An optional systemPrompt
field allows servers to request a specific system prompt. The client may modify or ignore this.
The includeContext
parameter specifies what MCP context to include:
"none"
: No additional context"thisServer"
: Include context from the requesting server"allServers"
: Include context from all connected MCP serversThe client controls what context is actually included.
Fine-tune the LLM sampling with:
temperature
: Controls randomness (0.0 to 1.0)maxTokens
: Maximum tokens to generatestopSequences
: Array of sequences that stop generationmetadata
: Additional provider-specific parametersThe client returns a completion result:
Here’s an example of requesting sampling from a client:
When implementing sampling:
includeContext
Sampling is designed with human oversight in mind:
When implementing sampling:
Sampling enables agentic patterns like:
Best practices for context:
Robust error handling should:
Be aware of these limitations:
Let your servers request completions from LLMs
Sampling is a powerful MCP feature that allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security and privacy.
This feature of MCP is not yet supported in the Claude Desktop client.
The sampling flow follows these steps:
sampling/createMessage
request to the clientThis human-in-the-loop design ensures users maintain control over what the LLM sees and generates.
Sampling requests use a standardized message format:
The messages
array contains the conversation history to send to the LLM. Each message has:
role
: Either “user” or “assistant”content
: The message content, which can be:
text
fielddata
(base64) and mimeType
fieldsThe modelPreferences
object allows servers to specify their model selection preferences:
hints
: Array of model name suggestions that clients can use to select an appropriate model:
name
: String that can match full or partial model names (e.g. “claude-3”, “sonnet”)Priority values (0-1 normalized):
costPriority
: Importance of minimizing costsspeedPriority
: Importance of low latency responseintelligencePriority
: Importance of advanced model capabilitiesClients make the final model selection based on these preferences and their available models.
An optional systemPrompt
field allows servers to request a specific system prompt. The client may modify or ignore this.
The includeContext
parameter specifies what MCP context to include:
"none"
: No additional context"thisServer"
: Include context from the requesting server"allServers"
: Include context from all connected MCP serversThe client controls what context is actually included.
Fine-tune the LLM sampling with:
temperature
: Controls randomness (0.0 to 1.0)maxTokens
: Maximum tokens to generatestopSequences
: Array of sequences that stop generationmetadata
: Additional provider-specific parametersThe client returns a completion result:
Here’s an example of requesting sampling from a client:
When implementing sampling:
includeContext
Sampling is designed with human oversight in mind:
When implementing sampling:
Sampling enables agentic patterns like:
Best practices for context:
Robust error handling should:
Be aware of these limitations: