RFD D26: Streaming Policy Evaluation for Tool Call Arguments
- Status: Draft
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2026-07-24
- Extends: RFD D25
- Requires: RFD 043
Summary
This RFD extends RFD D25's argument-conditional tool policies with streaming evaluation. Instead of waiting for the complete argument object before evaluating policy rules, JP evaluates rules incrementally as individual parameters finish streaming. This enables early permission prompts — seconds before large arguments like file content finish arriving — and early cancellation when the user declines a tool call mid-stream.
Motivation
RFD D25 introduces conditional policy rules that match on argument values. Its evaluation model waits for all arguments to arrive before evaluating:
ToolCallPart::Start { id, name } ← tool name known
ArgumentChunk("{"path":"src/...") ← arguments streaming
ArgumentChunk("...content":"fn...") ← still streaming (possibly seconds)
ArgumentChunk("...main() {}") ← still streaming
Flush ← arguments parsed, policy evaluated NOWFor fs_create_file with a rule { arg = "/path", prefix = "src/", mode = "unattended" }, the decision depends only on path. The path value finishes streaming in the first chunk — but JP waits for content (potentially thousands of lines) before acting on it.
The cost of waiting:
- Wasted streaming time. If the policy resolves to
askand the user declines, all time spent streamingcontentwas wasted. For large files this is seconds of LLM output tokens billed but never used. - Delayed approval. If the policy resolves to
unattended, execution could begin as soon aspathis known. Instead it waits for the full argument object. - Delayed prompt. If the policy resolves to
ask, the user could be reviewing and deciding whilecontentstreams in the background. Instead they wait, then decide, then wait again for execution.
Design
Prerequisites
This RFD depends on two prior RFDs:
- RFD D25 defines the conditional policy types (
RunPolicy,RunRule,ParamCondition,TypedMatcher), thepolicyconfig namespace, and the first-match-wins evaluation model. This RFD extends RFD D25's post-flush evaluation with a streaming equivalent. - RFD 043 introduces incremental argument parsing via
ToolCallArgumentProgressevents. Each event carries aStreamFragment;ObjectEntry { key, value: Done }signals that a parameter's value is complete at any nesting level. This RFD consumes those signals to drive policy evaluation.
Evaluation algorithm
The core evaluator is a pure function. It takes the policy's rule list and a map of parameter values received so far, and returns Option<RunMode> — Some if a rule matched, None if the first non-eliminated rule needs a parameter that hasn't arrived yet.
fn evaluate(rules, known_params) -> Option<RunMode>:
for rule in rules:
if rule has no condition:
return Some(rule.mode) // catch-all
if rule.arg is in known_params:
if matcher matches value:
return Some(rule.mode) // match found
else:
continue // rule eliminated, try next
else:
return None // can't skip — higher priority
// all conditional rules eliminated, no catch-all
return Some(RunMode::Ask) // implicit safety fallbackThe critical property: a rule whose parameter hasn't arrived yet blocks evaluation. First-match-wins means that rule has higher priority than everything below it. Skipping it to evaluate a lower-priority rule could produce the wrong mode.
Policies without conditional rules (string aliases like run = "ask", or a single catch-all rule) resolve on the first call with an empty known_params map — the catch-all matches immediately. These go through the existing permission fast path in ToolCoordinator::decide_permission; no StreamingPolicyState is created.
Where the evaluator lives
The evaluation function lives in jp_cli::cmd::query::tool::policy. It imports the rule types from jp_config::conversation::tool::policy (defined by RFD D25) and operates on serde_json::Value for parameter values.
jp_config defines the types; jp_cli evaluates them at runtime.
Integration with ToolCallArgumentProgress
RFD 043 makes EventBuilder::handle_part return Vec<ToolCallArgumentProgress> for ArgumentChunk events. Each progress event carries a StreamFragment. The turn loop forwards these to the ToolRenderer for incremental display.
This RFD adds a second consumer: the policy evaluator. A StreamingPolicyState struct, held per in-flight tool call, tracks which parameters have completed and feeds them to the evaluator:
struct StreamingPolicyState {
/// The policy rules for this tool (from config).
rules: Vec<RunRule>,
/// Parameter values received so far, keyed by parameter name.
known_params: HashMap<String, Value>,
/// Resolved mode, if any. `None` while waiting for parameters.
resolved: Option<RunMode>,
}When the turn loop receives a ToolCallArgumentProgress whose fragment signals a parameter completion (see Nested argument paths for how completion is detected at any depth), it:
- Uses the
FragmentAggregator(from RFD 043) to obtain the completeserde_json::Valuefor that parameter. - Inserts it into
known_params. - Re-evaluates: calls the evaluator with the updated
known_params. - If the result is
Decided, triggers the permission flow immediately.
Integration point in the turn loop
Today, permission is decided at the Flush boundary in turn_loop.rs (after EventBuilder emits the final ToolCallRequest). The streaming evaluator runs earlier — during the StreamingLoopEvent::Llm match arm, alongside the existing ToolCallArgumentProgress forwarding to the renderer.
The flow per tool call becomes:
ToolCallPart::Start { id, name }
→ Look up RunPolicy from tool config
→ If no conditional rules: use existing fast path (no StreamingPolicyState)
→ Otherwise: create StreamingPolicyState, resolved = None
ArgumentChunk → ToolCallArgumentProgress events
→ Forward to ToolRenderer (existing)
→ On parameter completion: feed to StreamingPolicyState
→ If Some: trigger early permission
→ If None: continue accumulating
Flush → ToolCallRequest with complete arguments
→ If already resolved: skip evaluation
→ If still None: all parameters are now known, evaluate once moreThe Flush fallback handles providers that don't stream arguments incrementally (e.g., Ollama emitting the entire argument object in one chunk). In that case, all ToolCallArgumentProgress events arrive in the same batch as Flush, and evaluation completes immediately. There is no separate "post-flush evaluation path" — the same evaluator is called, just with all parameters available at once.
Early permission prompt with partial arguments
When the evaluator resolves to ask before all arguments have arrived, the permission prompt is shown with partial argument information. The turn loop renders completed parameters and shows a placeholder for in-progress ones.
For the function_call display style:
Calling tool fs_create_file(path: "src/main.rs", streaming "content"...)
Allow? [y/n/e]For the json display style:
{
"path": "src/main.rs",
"#": "streaming argument \"content\"..."
}The user sees the parameter that triggered the policy rule (e.g., path) and can make an informed decision. If the user approves, streaming continues and execution begins when all arguments are complete. If the user declines, the tool call is cancelled immediately.
Partial argument rendering is new functionality that this RFD requires from the ToolRenderer. RFD 043 defines the progress event protocol but explicitly defers renderer changes. This RFD's Phase 4 covers the renderer work needed to display partial arguments during the permission prompt.
Cancellation on rejection
When the user declines a tool call during streaming, JP:
- Marks the tool call as
Completedwith a "skipped by user" response. - Discards remaining
ArgumentChunkevents for that tool call index. - If this is the only pending tool call, cancels the LLM response stream. LLMs emit tool calls as the terminal content of a response — no message text follows. Cancelling the stream avoids generating (and billing for) argument tokens that will be discarded. For a thousand-line
fs_create_file, this can save significant token cost. - If other tool calls are still in flight, keeps the stream alive so those tool calls can complete. The rejected tool call's argument chunks are silently discarded.
Stream cancellation uses the existing CancellationToken infrastructure. The turn loop already handles mid-stream cancellation via Ctrl+C signals; the mechanism is identical.
Nested argument paths
RFD D25 supports arg pointers at any depth — /path (top-level), /patterns/paths (nested array), /patterns/old (nested string). Streaming evaluation works at all depths by consuming RFD 043's nested Done signals.
Top-level parameters
For a top-level parameter like /path, completion is signaled by a top-level ObjectEntry { key: "path", value: Done } in the fragment stream. The aggregator yields the complete serde_json::Value for path. The evaluator matches against it.
Nested parameters with array traversal
For a nested parameter like /patterns/paths, RFD D25 defines existential semantics: the condition is met if any resolved value satisfies the matcher.
Existential matching is monotonic: the decision can go from "no match yet" to "match" as array elements arrive, but never from "match" to "no match." This makes incremental evaluation safe.
RFD 043's fragment protocol emits Done at each nesting level. For /patterns/paths:
ObjectEntry { key: "patterns", value: ArrayItem { index: 0, value:
ObjectEntry { key: "paths", value: ArrayItem { index: 0, value:
String("src/a.rs") } } } }
ObjectEntry { key: "patterns", value: ArrayItem { index: 0, value:
ObjectEntry { key: "paths", value: ArrayItem { index: 0, value: Done } } } }
...
ObjectEntry { key: "patterns", value: ArrayItem { index: 0, value:
ObjectEntry { key: "paths", value: Done } } } ← paths[0] array complete
ObjectEntry { key: "patterns", value: ArrayItem { index: 0, value: Done } }
ObjectEntry { key: "patterns", value: ArrayItem { index: 1, value: ... } }
...
ObjectEntry { key: "patterns", value: Done } ← patterns array completeThe evaluator processes this as follows:
- As each leaf value completes (e.g.,
"src/a.rs"inpatterns[0].paths[0]), check the matcher. If it matches, the rule is satisfied — returnDecidedimmediately. No need to wait for the rest of the array. - If a leaf value doesn't match, continue. More elements may arrive.
- When the top-level parameter completes (
patternsDone), all nested values have been seen. If no element matched, the rule is eliminated.
The StreamingPolicyState tracks per-rule match state for array traversals: a boolean "matched" flag. Once set, the rule is decided and further elements for that rule are ignored. If the top-level parameter completes without a match, the flag remains false and the rule is eliminated.
When evaluation blocks
A rule's arg pointer references a specific top-level parameter. The evaluator returns Waiting only when that top-level parameter hasn't started arriving yet. Once any fragment for the parameter appears, the evaluator is actively checking. Once the parameter's Done arrives, the rule is either matched (Decided) or eliminated (try next rule).
This means a rule on /patterns/paths doesn't block evaluation until Flush — it resolves as soon as either (a) any nested paths element matches, or (b) patterns finishes streaming without a match.
RunMode::Edit waits for complete arguments
When the evaluator resolves to edit, the editor needs the complete argument object — the user can't edit partial arguments. The streaming evaluator treats edit as decided (the mode is known) but defers the actual editor prompt until all arguments have arrived.
ask and unattended are the modes that benefit from early action. skip also benefits trivially (the tool is rejected immediately).
Parameter ordering convention
LLMs typically stream arguments in the order properties appear in the JSON schema. JP controls this order through IndexMap iteration in ToolParameterConfig, which reflects the declaration order in the tool's TOML definition.
Most tool definitions already declare short, policy-relevant parameters first (path, source, util) and large content parameters last (content, patterns). This RFD documents this as a convention:
Parameters referenced by
policy.runorpolicy.resultrules should be declared before large content parameters in the tool's parameter list.
This is a soft convention. JP does not enforce ordering or reorder parameters automatically. If a provider streams parameters out of schema order, the evaluator returns Waiting until the needed parameter arrives — no incorrect decisions, just lost optimization. The worst case is identical to RFD D25's current behavior (evaluate after all arguments arrive).
Drawbacks
Monotonic array evaluation adds per-rule state. Each rule with an array traversal needs a "matched" flag tracked across multiple fragment events. This is simple bookkeeping, but it's state that doesn't exist in RFD D25's post-complete evaluation where the full argument object is available.
Partial argument rendering requires new renderer work. The permission prompt needs to display known parameters alongside streaming placeholders. This is new functionality in the
ToolRendererthat doesn't exist today and isn't provided by RFD 043.
Alternatives
Cancel the LLM stream unconditionally on rejection
Always cancel the LLM stream when the user rejects a tool call, even if other tool calls are in flight. The other tool calls would be re-requested in the next turn.
Rejected. Cancelling discards partially-streamed tool calls that may be nearly complete. The re-request costs a full additional LLM round-trip. Selective cancellation (only when the rejected call is the last pending) is strictly better.
Speculative evaluation past Waiting rules
When a rule returns Waiting, speculatively evaluate later rules to see if any can be decided. If a later rule matches, use its mode provisionally and re-evaluate when the pending parameter arrives.
Rejected. This violates first-match-wins semantics. A provisional decision from rule 3 might be wrong if rule 1's parameter arrives and matches. The user would see a prompt based on rule 3's mode, then potentially a different behavior when rule 1 resolves.
Top-level parameters only
Restrict streaming evaluation to rules whose arg pointer has a single segment (e.g., /path). Multi-segment pointers (/patterns/paths) fall back to evaluation after all arguments arrive.
Rejected. fs_modify_file's patterns array with nested paths is a core use case for argument-conditional policies. Deferring nested evaluation removes the streaming benefit for one of the most important tools. The monotonic property of existential array matching makes incremental nested evaluation sound without excessive complexity.
Non-Goals
LLM stream cancellation with multiple pending tool calls. When other tool calls are still streaming, the rejected call's chunks are discarded but the stream stays alive.
Argument reordering. Manipulating the JSON schema property order to ensure policy-relevant parameters stream first. The schema order is a convention, not a guarantee.
Partial argument execution. Starting tool execution before all arguments arrive (e.g., beginning a file write as soon as
pathis known, beforecontentfinishes). Tool execution always uses the complete argument object.
Risks and Open Questions
RFD 043 completion signal granularity. This RFD assumes RFD 043's
ToolCallArgumentProgressemitsDoneat each nesting level, including within arrays. If 043's fragment protocol changes during implementation, the nested evaluation integration may need adjustment.Prompt display during streaming. The permission prompt shows partial arguments. For the
jsondisplay style, the placeholder syntax ("#": "streaming...") is not ideal —#is a valid JSON key. A dedicated partial-arguments rendering approach may be needed.Multiple tool calls in one response. When the LLM emits multiple tool calls, each has independent streaming policy state. Concurrent prompts need sequencing — the existing
prompt_active/pending_promptsqueue inToolCoordinatorhandles this.Provider streaming behavior. The optimization assumes providers stream argument JSON incrementally in small chunks. If a provider buffers arguments and emits them as a single large
ArgumentChunk, streaming evaluation adds no benefit (but no cost beyond creating theStreamingPolicyState). This should be validated empirically with Anthropic, OpenRouter, and OpenAI.
Implementation Plan
Phase 1: Policy evaluator function
- Implement
evaluate(rules: &[RunRule], known_params: &HashMap<String, Value>) -> PolicyDecisioninjp_cli::cmd::query::tool::policy. - Handle both return cases:
Decided,Waiting. - For array-traversal rules, accept a pre-resolved
Valueinknown_params(the caller handles fragment aggregation; the evaluator walks the value with RFD D25's existential semantics). - Unit tests covering: catch-all policies, single-rule match, rule elimination, waiting on missing param, catch-all after eliminations, implicit fallback, nested array existential matching.
No dependencies beyond RFD D25's types. Can merge independently.
Phase 2: StreamingPolicyState
- Implement
StreamingPolicyStatestruct that wraps the evaluator with per-tool-call state (known_paramsmap, current decision, per-rule array match flags). - Method
feed_parameter(key: String, value: Value) -> Option<RunMode>that inserts the value and re-evaluates if currently unresolved. - Method
feed_nested_element(top_level_key: &str, path: &[PathSegment], value: Value) -> Option<RunMode>for incremental array element evaluation. - Unit tests covering: parameter arrival order, re-evaluation on each new param, early array match, array exhaustion without match.
Depends on Phase 1.
Phase 3: Turn loop integration
- On
ToolCallPart::Start: look up the tool'sRunPolicy. If no conditional rules, use the existing fast path. Otherwise, createStreamingPolicyState. - On
ToolCallArgumentProgresswith parameter completion: callStreamingPolicyState::feed_parameterorfeed_nested_element. If resolved:- For
ask: trigger the permission prompt with partial arguments. - For
unattended: mark as pre-approved, render the tool call header with known arguments. - For
skip: mark as completed with a skip response. - For
edit: record the decision but defer the editor prompt until all arguments arrive.
- For
- On Flush: if still
Waiting(non-streaming provider fallback), all parameters are now available — evaluate once more. - Wire up cancellation: if the user declines and this is the only pending tool call, cancel the LLM stream. Otherwise discard remaining argument chunks for that tool call index.
Depends on Phase 2 and RFD 043 Phase 3 (event plumbing).
Phase 4: Partial argument rendering
- Extend
ToolRendererto accept partial argument maps with a set of "streaming" parameter names for placeholder display. - Implement placeholder rendering for
function_callandjsondisplay styles. - Update the permission prompt to render partial arguments when triggered during streaming.
Depends on Phase 3. Can be refined iteratively after the core integration lands.
References
- RFD D25 — Argument-conditional tool policy. Defines the rule types, matchers, first-match-wins evaluation, and post-flush evaluation model that this RFD extends.
- RFD 043 — Incremental tool call argument streaming. Provides the
ToolCallArgumentProgressevents andFragmentAggregatorthat this RFD consumes for per-parameter completion signals. - RFD 075 — Tool sandbox and access policy.
- RFD 076 — Tool access grants.