Documentation Index
Fetch the complete documentation index at: https://modelcontextprotocol.io/llms.txt
Use this file to discover all available pages before exploring further.
ApprovedStandards Track
| Field | Value |
|---|---|
| SEP | 2322 |
| Title | Multi Round-Trip Requests |
| Status | Approved |
| Type | Standards Track |
| Created | 2026-02-03 |
| Author(s) | Mark D. Roth (@markdroth), Caitie McCaffrey (@CaitieM20), |
| Sponsor | Caitie McCaffrey (@CaitieM20) |
| PR | #2322 |
Abstract
This proposal specifies a simple way to handle server-initiated requests in the context of a client-initiated request (e.g., an elicitation request in the context of a tool call) without requiring a shared storage layer shared across server instances or statefulness in load balancing, which will significantly reduce the cost of operating MCP servers at scale in the common case. It also reduces the HTTP transport’s dependence on SSE streams, which cause problems in a lot of environments that cannot support long-lived connections. This proposed way of handling server-initiated requests will replace the current approach of sending server-initiated requests. This is a breaking change. This SEP also specifies the subset of client requests that a server can send a server-initiated request on. This is a reduced scope compared to the current spec and is also a breaking change. Making a breaking change here is necessary since adoption of server-initiated request features like Elicitation, Sampling and ListRoots is very low or blocked for many Remote MCP servers or Server Hosted Clients due to the operational complextity of supporting the SSE streams and server-side state.Motivation
Note: This SEP is intended to provide a generic mechanism for handling any server-initiated request in the context of any client-initiated request. For clarity, throughout this document, we will specifically discuss tool calls as a proxy for any client-initiated request, but it should be read as applying equally to (e.g.) resource or prompt requests; similarly, we will discuss elicitation requests as a proxy for any server-initiated request, but it should be read as applying equally to (e.g.) sampling requests. We start with the observation that there are two types of MCP tools:- Ephemeral: No state is accumulated on the server side.
- If server needs more info to process the tool call, it can start from scratch when it gets that additional info.
- Examples: weather app, accessing email
- Persistent: State is accumulated on the server side.
- Server may generate a large amount of state before requesting more info from the client, and it may need to pick up that state to continue processing after it receives the info from the client.
- Server may need to continue processing in the background while waiting for more info from the client, in which case server-side state is needed to track that ongoing processing.
- Examples: accessing an agent, spinning up a VM and needing user interaction to manipulate the VM
- Client sends tool call request. For this example, let’s assume that the load balancers happen to send this request to server instance A.
- Server A opens an SSE stream and sends the elicitation request on that stream.
- Client sends the elicitation response as a separate request, for which the load balancers will choose a server instance completely independently of the one they chose in step 1. In this example, let’s assume that the load balancers happen to send this request to server instance B.
- Server A must somehow discover the elicitation response delivered to server B.
- Server A then sends the tool call result on the SSE stream opened in step 2.
- Persistent Storage Layer Shared Across Server Instances: Servers can
deploy and manage a persistent storage layer (e.g., PostgreSQL, Redis,
DynamoDB), which allow multiple server instances to match up the
elicitation response on one server instance with the original ongoing
tool call on a different server instance. This approach has a number
of drawbacks:
- The persistent storage layer is extremely expensive, especially for ephemeral tools that may not already have such a layer (e.g., a weather tool).
- The persistent storage layer imposes significant reliability concerns: it becomes a critical dependency and therefore a potential single point of failure. To avoid that, it must provide high availability, replication, and backup mechanisms.
- The persistent storage layer becomes a bottleneck, limiting horizontal scalability. Geographic distribution requires either expensive global replication or sticky routing.
- The persistent storage layer also imposes significant operational complexity. In horizontally scaled deployments, it requires distributed locking or consensus protocols. It also requires special garbage collection logic to determine when shared can be cleaned up, which requires careful trade-offs: cleaning up state too aggressively can reduce storage costs but limit how long users have to respond, whereas cleaning up less aggressively accommodates slow users but increases storage costs.
- This approach requires special behavior in the tool implementation to integrate with the persistent storage layer. The MCP SDKs today do not have any special hooks for this sort of storage layer integration, which means that it’s very hard to write in-line code via the SDKs.
- Statefulness in Load Balancing: With the use of cookies, it is
possible for the load balancing layer to ensure that the elicitation
request in step 3 is delivered to the same server instance that the
original request was delivered to in step 1. This approach, while
often cheaper than a persistent storage layer, has the following
drawbacks:
- It requires special configuration and behavior in the load balancers, which is often difficult to manage.
- It breaks normal load balancing models, resulting in uneven load distribution, thus increasing the cost of running the service.
- It requires special behavior in clients to propagate the cookies used for statefulness.
- It requires the tool implementation to match up the elicitation request with the ongoing tool call. (The MCP SDKs have some code to handle this, but it’s still a very strange pattern in the HTTP world.)
- It is not fault tolerant. If the server instance goes down, all state is lost, and the tool call would need to start over from scratch. (This doesn’t necessarily matter for ephemeral tools, but it is an issue for persistent tools.)
Specification
This SEP proposes a new mechanism for handling server requests in the context of a client request. This new mechanism will have a slightly different workflow for ephemeral tools and persistent tools, the latter of which will leverage Tasks. However, both workflows will use the same data structures.Schema Changes
First, we introduce the notion ofInputRequests, which represents
a set of one or more server-initiated request to be sent to the client,
and InputResponses, which represents the client’s responses to
those requests. Both requests and responses are stored in a map with
string keys. For InputRequests, the map values are server-initiated
requests (e.g., elicitation or sampling requests), whereas for InputResponses, the map values are the responses to those requests. Here’s
how that would look in the typescript MCP schema:
Result which indicate the ResultType. The client should parse this field to determine the type of the Result contained in the message. If this field is not provided the Client should assume a ResultType of “complete” for backwards compatibility. The schema change will look like this:
tasks as well.
These types will be used in two different workflows, one for ephemeral
tools and another for persistent tools.
Server-Initiated Request Support for Client Requests
ManyClientRequest don’t have clear use cases where a Server would need to
request more information from the Client. This SEP builds upon SEP-2260 and further restricts when a Server can send a Server-Initiated Request to the Client.
Servers MAY send InputRequiredResult responses on the following Client Requests:
| ClientRequest | ServerResult | InputRequiredResult Supported |
|---|---|---|
GetPromptRequest | GetPromptResult | Yes |
ReadResourceRequest | ReadResourceResult | Yes |
CallToolRequest | CallToolResult | Yes |
GetTaskPayloadRequest | GetTaskPayloadResult | Yes |
InputRequiredResult responses on any other Client Requests. The below table represents what ClientRequests this excludes at the writing of this SEP.
| ClientRequest | InputRequiredResult Supported |
|---|---|
PingRequest | No |
InitializeRequest | No |
CompleteRequest | No |
SetLevelRequest | No |
ListPromptsRequest | No |
ListResourcesRequest | No |
ListResourceTemplatesRequest | No |
SubscribeRequest | No |
UnsubscribeRequest | No |
ListToolsRequest | No |
GetTaskRequest | No |
ListTasksRequest | No |
CancelTaskRequest | No |
TaskInputResponseRequest | No |
Ephemeral Tool Workflow
For the ephemeral use case, in addition to input requests, we introduce the concept of request state. In cases where the server needs more information, the request state is sent to the client which echoes back the state to the server, allowing the server to remain stateless. We will adopt the following workflow for ephemeral tools:- Client sends tool call request.
- Server sends back a single response indicating that the request is incomplete. The response may include input requests that the client must complete. It may also include some request state that the client must return back to the server. This response terminates the original request. It will normally be sent as a single response, not on an SSE stream, although for now (this may change in a future SEP) it is also legal to send this response on an SSE stream following (e.g.) progress notifications. If this incomplete response is sent on an SSE stream, it must be the last message on the SSE stream, just as if it were a normal response.
- Client sends a new tool call request, completely independent of the original one. This new tool call includes responses to the input requests from step 2. It also includes the request state specified by the server in step 2.
- Server sends back a CallToolResponse.
Real-World Example for Ephemeral Workflow
This example demonstrates howrequestState enables a multi-round-trip
elicitation flow driven by Azure DevOps custom
rules.
The scenario involves an update_work_item tool that transitions a Bug
work item to “Resolved.” ADO custom rules require specific fields when
certain state transitions occur, and the server uses iterative
elicitation to gather them — accumulating context in requestState
across rounds so that the final update can be executed without any
server-side storage.
Use Cases for Request State
The “requestState” mechanism provides a mechanism for doing multiple round trips on the same logical request. There are two main use-cases for this.Use Case 1: Rolling Upgrades
Let’s say that you are doing a rolling upgrade of your horizontally scaled server instances to deploy a new version of a tool implementation. The old version had two input requests with keys “github_login” and “google_login”. However, in the new version of the tool implementation, it still uses the “github_login” input request, but it replaces the “google_login” input request with a new “microsoft_login” input request. If the first request goes to an old version of the server but the second attempt (that includes the input responses) goes to a new version of the server, then the server will see the result for “github_login”, which it needs, but it won’t see the result for “microsoft_login”. (It will also see the result for “google_login”, but it no longer needs that, so it doesn’t matter.) At this point, the server needs to send a new input request for “microsoft_login”, but it also doesn’t want to lose the answer that it’s already gotten for “github_login”, so it would use the kind of state proposed in 1685 to retain that information without having to store the state on the server side. The workflow here would look like this:- Client sends tool call request that hits a server instance running the old version.
- Server sends back an incomplete response indicating the input requests for “github_login” and “google_login”.
- Client sends a new tool call request that includes the responses to the input requests for “github_login” and “google_login”. This time it hits a server instance running the new version.
- Server sends back another incomplete response indicating the input request for “microsoft_login”, which the client has not already provided. However, the response also includes request state containing the already-provided “github_login” response, so that the client does not need to prompt the user for the same information a second time.
- Client sends a third tool call request that includes the response to the “microsoft_login” input request as well as echoing back the request state provided by the server in step 4.
- Server now sees the “github_login” info in the request state and the “microsoft_login” state in the input responses, so the request now contains everything the server needs to perform the tool call and send back a complete response.
Use Case 2: Load Shedding
Let’s say that you have an MCP server instance that is processing a bunch of tool calls and notices that it’s too heavily loaded, so it wants to move one of the ongoing tool calls to a different server instance. However, it has already done a significant amount of processing on that tool call, so it does not want to simply fail the call and have the client start over from scratch on another server instance; instead, it wants to preserve the state it has already accumulated, so that whichever server instance resumes processing can pick up from where the original server instance left off. This can be accomplished by sending an incomplete request that contains request state but does not contain any input requests. The workflow here would look like this:- Client sends the original request, which the load balancers route to server instance A.
- Server instance A does a bunch of computation before deciding that it
needs to shed load. It sends an incomplete response with its
accumulated state in the
requestStatefield but without theinputRequestsfield. - Client retries the request with the
requestStatefield attached. The load balancers route this request to server instance B. - Server instance B starts from the state it sees in the
requestStatefield, thus picking up the computation from where server instance A left off, and eventually returning a complete response.
Protocol Requirements for Ephemeral Workflow
-
Server Behavior:
- Servers MAY respond to any client-initiated request with a
InputRequiredResult. This message MAY be sent either as a standalone response or as the final message on an SSE stream, although implementations are encouraged to prefer the former. If using an SSE stream, servers MUST NOT send any message on the stream after the incomplete response message. - The
InputRequiredResultMAY include aninputRequestsfield. - The
InputRequiredResultMAY include arequestStatefield. If specified, this field is an opaque string that is meaningful only to the server. Servers are free to encode the state in any format (e.g., plain JSON, base64-encoded JSON, encrypted JWT, serialized binary, etc.). - If a request contains a
requestStatefield, servers MUST always validate that state, as the client is an untrusted intermediary. If tampering is a concern, servers SHOULD encrypt therequestStatefield using an encryption algorithm of their choice (e.g., they can use AES-GCM or a signed JWT) to ensure both confidentiality and integrity. Note that there is also a risk of replaying/hijacking attacks, where an authenticated attacker resends state that was originally sent to a different user. Therefore, if the request state contains any data that is specific to the original user, the server MUST use some mechanism to cryptographically bind the data to the original user and MUST verify that therequestStatedata sent by the client is associated with the currently authenticated user. Servers using plaintext state MUST treat the decoded values as untrusted input and validate them the same way they would validate any client-supplied data.
- Servers MAY respond to any client-initiated request with a
-
Client Behavior:
- If a client receives an
InputRequiredResultmessage, if the message contains theinputRequestsfield, then the client MUST construct the requested input before retrying the original request. In contrast, if the message does not contain theinputRequestsfield, then the client MAY retry the original request immediately. - If a client receives a
InputRequiredResultmessage that contains therequestStatefield, it MUST echo back the exact value of that field when retrying the original request. Clients MUST NOT inspect, parse, modify, or make any assumptions about therequestStatecontents. If theInputRequiredResultdoes not contain arequestStatefield, the client MUST NOT include one in the retry.
- If a client receives an
Persistent Tool Workflow
The persistent tool workflow will leverage Tasks.Tasks already provide a mechanism to indicate that more information is needed to complete the request. The input_required Task Status allows the server to indicate that additional information is needed to complete processing the task.
The workflow for Tasks is as follows:
- Server sets Task Status to
input_required. The server can pause processing the request at this point. - Client retrieves the Task Status by calling
tasks/getand sees that more information is needed. - Client calls
tasks/result - Server returns the
InputRequestsobject. - Client calls
tasks/input_responserequest that includes anInputResponsesobject along withTaskmetadata field. - Server resumes processing sets TaskStatus back to
working.
Tasks are likely longer running, have state associated with them, and are likely more costly to compute, the request for more information does not end the originally requested operation (e.g., the tool call). Instead, the server can resume processing once the necessary information is provided.
To align with MRTR semantics, the server will respond to the tasks/result request with a InputRequests object. Both of these will have the same JsonRPC id. When the client responds with a InputResponses object this is a new client request with a new JSONRPC id and therefore needs a new method name. We propose tasks/input_response.
The above workflow and below example do not leverage any of the optional Task Status Notifications although this SEP does not preclude their use.
Protocol Requirements for Persistent Workflow
-
Server Behavior:
- Servers MAY respond to
tasks/getby indicating that the task is in stateinput_required. - Servers MUST include an
inputRequestsfield in thetasks/resultresponse when the task is in stateinput_required.
- Servers MAY respond to
-
Client Behavior:
- When
tasks/getshows stateinput_required, clients MUST calltasks/resultto get the input requests. Clients SHOULD construct the results of those requests, and then calltasks/input_responsewith the input responses to provide the required input for the task. - Clients MAY choose not to fulfill the input requests, in which case they can cancel the task.
- When
Interactions Between Ephemeral and Persistent Workflows
If a tool implementation needs the client to respond to a set of input requests before it can even start processing but then later needs to do persistent processing, it can start using the ephemeral workflow and then switch to the persistent workflow by creating a task at that point. This avoids the need for the server to store state until it actually has the information needed to start processing the request. This workflow would look like this:- Client sends tool call request with task metadata.
- Server sends back
inputRequestsresponse indicating that more information is needed to process the request. This terminates the original request. - Client sends a new tool call request, completely independent of the
original one, which includes the
inputResponsesobject along with the task metadata. - Server sends back a task ID, indicating that it will be processing the request in the background. All subsequent interaction will be done via the Tasks API.
Guidance for Error Handling
This section provides implementation guidance for error handling in scenarios where the client provides unexpected or malformed data in theinputResponses object.
As with any received request, the server SHOULD validate the data provided by the client is a valid inputResponses object and that the information inside can be correctly parsed. Protocol errors, like malformed JSON, invalid schema, or internal server errors which prevent the processing of the request should return a JSONRPCErrorResponse with an appropriate error code and message.
If additional parameters are provided in the inputResponses object The server SHOULD treat these as optional parameters. Therefore it SHOULD ignore any unexpected information in the inputResponses object that it does not recognize or need.
The client may also fail to send all the information requested in previous inputRequests. If the missing information requested is necessary for the server to process the request, then it SHOULD respond with a new InputRequiredResult.
We discussed having a specific application level error code returned, however the client may not have enough information to recover in all scenarios. Therefore, we decided to rely on the existing mechanics of requesting more input via InputRequiredResult to ensure a client can always recover by having the server request the necessary information again.
Malicious clients could intentionally send incorrect information in the inputResponses object, and generate load on the server by causing it to repeatedly request the same information. However, this is not a new concern introduced by this workflow, since malicious clients could already generate load by sending malformed requests. Server implementors can use standard techniques like rate limiting and throttling to protect themselves from such attacks.
In the ephemeral workflow, this would look like the following:
- The client retries the original tool call, this time including the
inputResponsesobject, but the response is missing required information that the server needs to process the request.
- The server responds with an incomplete response, indicating that the client needs to respond to an elicitation request in order for the tool call to complete, and including request state to be passed back:
- The Server responds wit an incomplete response, indicating that the client needs to provide missing information for the request to succeed.
JSONRPCResultResponse. However, since the response is missing required information, the server does not proceed with processing the taks and leaves the Task status as input_required. The next time the client calls tasks/result, the server responds with a new inputRequest requesting the necessary information again.
Rationale
We considered a bidirectional stream approach to replace SSE streams. However, that approach would have made the wire protocol more complicated (e.g., it would have required HTTP/2 or HTTP/3). Also, it would not have eliminated problems for environments that cannot support long-lived connections, nor would it have addressed fault tolerance issues. There was discussion about whether the input requests should be a map or just a single object, possibly leveraging some field inside of the requests (e.g., the elicitation ID) to differentiate between them. We decided that the map makes sense, since it structurally guarantees the uniqueness of keys, which will avoid the need for explicit checks in SDKs and applications to avoid conflicts. In the persistent workflow, we considered including the input requests directly in thetasks/get response, rather than requiring the client
to see the input_required status and then call tasks/result to get
the input requests. We decided to keep those two things separate in
deference to implementations that use separate infrastructure for task
state and for the actual tool implementation; the idea is that the
tasks/get call should have a consistent latency profile, regardless of
what the task state actually is. We recognize that this requires an
extra round-trip to the server, but we can optimize this in the future
if becomes a problem.
Backward Compatibility
Today many sdks support elicitation via an in-line but async fashion which waits for the elicitation response before sending the tool call response on the original SSE stream, this works for MCP Servers that are a single-process or can ensure sticky routing of requests.Security Implications
BecauserequestState passes through the client, malicious or
compromised clients could attempt to modify it to alter server behavior,
bypass authorization checks, or corrupt server logic. To mitigate this,
we require servers to validate this state as described in the protocol
requirements above.