RFD 064: Non-Destructive Conversation Compaction
- Status: Discussion
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2025-07-17
- Supersedes: RFD 036
Summary
This RFD introduces conversation compaction as a non-destructive, additive operation. Instead of mutating or deleting events in the stored conversation, compaction appends overlay events that instruct the projection layer to present a reduced view when building the LLM request. The original events are always preserved. Compaction policies are defined per content type (summary, reasoning, tool calls), composed across multiple compaction events, and configured at the workspace and conversation level.
Motivation
Long-running conversations degrade LLM performance. Research confirms that when models take a wrong turn early in a conversation, they don't recover (see: Issue #57). Even when the conversation stays on track, growing context windows cause:
- Higher cost. Every cached and uncached input token is billed. Tool call responses — file contents, grep results, test output — dominate the token count in coding conversations.
- Slower responses. More input tokens means higher time-to-first-token.
- Lower quality. Models lose focus in long contexts. Obsolete tool results and abandoned tangents actively mislead the model.
- Context window overflow. Eventually the conversation exceeds the model's window and fails outright.
Today, users work around this by forking the last turn (jp conversation fork --last 1) and losing all prior context. This is effective but blunt — it discards useful context along with the noise.
JP needs a way to selectively reduce conversation size while preserving the context that matters. Multiple existing RFDs defer to this one:
- RFD 011 (System Notification Queue): "If JP ever implements conversation compaction..."
- RFD 034 (Inquiry Config): "smarter compaction (summarization, middle-out trimming) is orthogonal"
RFD 036 proposed compaction as a destructive transformation of the ConversationStream — strategies that mutate or replace events, with fork-by-default as a safety net. This RFD supersedes that design with a non-destructive approach: compaction events are appended to the stream and define a projection of the original events, preserving the full history while presenting a reduced view to the LLM.
Design
Core Concept: Compaction as Overlay
A compaction event is an InternalEvent variant — like ConfigDelta — that modifies how preceding events are interpreted when building the LLM request. It does not modify or delete any existing events.
InternalEvent::ConfigDelta → "from here on, use this config"
InternalEvent::Compaction → "when building the LLM view, apply these
policies to events in this range"The original events remain in events.json. The projection layer in Thread::into_parts() reads all compaction events, builds a projection plan, and yields the appropriate view to the provider.
User-Facing Behavior
The compact Command
jp conversation compact [ID] [OPTIONS]Compacts the active conversation (or the specified one) by appending a compaction event. The original events are untouched.
# Compact with workspace defaults
jp conversation compact
# Compact using a named profile
jp conversation compact --profile heavy
# Compact a specific range
jp conversation compact --from 5h --to 1h
# Compact everything except the last 3 turns
jp conversation compact --keep-last 3
# Preview what would change
jp conversation compact --dry-runFlags:
| Flag | Default | Description |
|---|---|---|
--profile <name> | default | Named compaction profile from config. |
--from <bound> | start of conversation | Start of the compacted range |
| (inclusive). | ||
--to <bound> | end of conversation | End of the compacted range (inclusive). |
--keep <N> | from config | Shorthand for --to N turns ago. |
--dry-run | false | Preview mechanical effects without |
| applying. |
Range bounds accept several formats:
| Value | Example | Meaning |
|---|---|---|
| Positive integer | --from 5 | Absolute turn index (0-based). |
| Negative integer | --to -3 | 3 turns before the last turn. |
| Duration string | --from 5h | Time ago (resolved to a turn index). |
last | --from last | Turn of the most recent compaction |
| event, or start if none. |
--from without a value defaults to last. All bounds are resolved to absolute turn indices at creation time and stored as integers.
The --compact Flag on query
# Compact with default profile, then query
jp query --compact -- "Continue working on the feature"
# Compact with a named profile, then query
jp query --compact=heavy "Continue working on the feature"Equivalent to jp conversation compact followed by jp query. --compact alone uses the conversation's default profile; --compact=<name> uses the named profile.
The --compact Flag on fork
# Fork and compact with default profile
jp conversation fork --compact
# Fork and compact with a named profile
jp conversation fork --compact=heavyForks the conversation and appends a compaction event to the fork. Uses the forked conversation's resolved compaction config.
Viewing Compacted Conversations
# Print the full history (default)
jp conversation print
# Print the compacted view (what the LLM sees)
jp conversation print --compactedCompaction Event Model
The Compaction Type
A compaction event defines an explicit range and optional per-content-type policies:
/// A compaction overlay stored in the event stream.
///
/// Defines how events within [from_turn, to_turn] should be projected
/// when building the LLM request. The original events are unmodified.
pub struct Compaction {
pub timestamp: DateTime<Utc>,
/// First turn in the compacted range (inclusive, 0-based).
pub from_turn: usize,
/// Last turn in the compacted range (inclusive, 0-based).
pub to_turn: usize,
/// When set, replaces ALL provider-visible events in the range
/// with a pre-computed summary. Takes precedence over `reasoning`
/// and `tool_calls`.
pub summary: Option<SummaryPolicy>,
/// Policy for ChatResponse::Reasoning events.
/// Ignored when `summary` is set.
pub reasoning: Option<ReasoningPolicy>,
/// Policy for ToolCallRequest and ToolCallResponse pairs.
/// Ignored when `summary` is set.
pub tool_calls: Option<ToolCallPolicy>,
}None means "this compaction has no opinion on this content type" — the original events pass through, or an earlier compaction's policy applies.
Per-Content-Type Policies
Each content type has its own policy enum, carrying only what makes sense for that type:
pub enum ReasoningPolicy {
/// Omit all reasoning events from the projected view.
Strip,
}
/// Replaces ALL provider-visible events in the range with a
/// pre-computed summary. Messages, reasoning, and tool calls are
/// all replaced by a single synthetic ChatRequest/ChatResponse pair.
pub struct SummaryPolicy {
/// The summary text, generated at compaction-creation time.
pub summary: String,
}
pub enum ToolCallPolicy {
/// Replace request arguments and/or response content with compact
/// summaries. Preserves tool name, call ID, and success/error status.
///
/// Parses from strings for config ergonomics:
/// - `"strip"` → Strip { request: true, response: true }
/// - `"strip-responses"` → Strip { request: false, response: true }
/// - `"strip-requests"` → Strip { request: true, response: false }
///
/// Or inline table: `{ policy = "strip", request = true, response = true }`
Strip {
/// Replace arguments with a compact summary.
request: bool,
/// Replace response content with a status line.
response: bool,
},
/// Remove all tool call pairs entirely.
Omit,
}Eagerness Principle
Transformations fall into two categories:
Eager (store the result). Expensive or non-deterministic operations — LLM-generated summaries. The output is stored in the compaction event (
SummaryPolicy { summary }) because regenerating it would be costly and potentially different each time.Lazy (store the policy). Cheap, deterministic operations — stripping reasoning, replacing tool responses with a status line, omitting events. The policy is stored (
ToolCallPolicy::StripResponses), and the projection layer applies it at read time.
Integration with InternalEvent
The compaction event is a new variant of InternalEvent, alongside ConfigDelta and Event:
pub enum InternalEvent {
ConfigDelta(ConfigDelta),
Event(Box<ConversationEvent>),
Compaction(Compaction),
}Like ConfigDelta, compaction events are stream metadata — they are not visible to providers, not counted by ConversationStream::len(), and are preserved by retain().
Projection Layer
The projection layer transforms the raw event stream into the view sent to the LLM. It is applied in Thread::into_parts(), which already filters events via is_provider_visible().
Algorithm
- Collect all
Compactionevents from the stream. - For each conversation event at turn T with content type C: a. Find all compaction events where
from_turn <= T <= to_turnand the policy for C isSome. b. Of those, the one with the latest timestamp wins. c. Apply the winning policy: keep, omit, strip, or substitute. - When
summaryis set: omit ALL provider-visible events in the range (messages, reasoning, tool calls). Inject a syntheticChatRequest/ChatResponse::Messagepair at thefrom_turnposition containing the pre-computed summary. Whensummaryis set, thereasoningandtool_callspolicies are ignored for events in the range.
This logic lives in a new ConversationStream::projected_iter() method (or similar), called by Thread::into_parts() instead of the raw iterator.
Projection Example
A concrete example showing how the projection applies across event types:
Raw stream (turns 0-2, then turns 3+ uncompacted):
Turn 0: TurnStart
Turn 0: ChatRequest("set up the project")
Turn 0: ChatResponse::Message("I'll create the project structure.")
Turn 0: ToolCallRequest(id="1", fs_create_file, {path: "src/main.rs"})
Turn 0: ToolCallResponse(id="1", ok, "<200 lines of code>")
Turn 0: ChatResponse::Message("Created src/main.rs with a basic setup.")
Turn 1: TurnStart
Turn 1: ChatRequest("add error handling")
Turn 1: ChatResponse::Reasoning("<500 tokens of thinking>")
Turn 1: ToolCallRequest(id="2", fs_read_file, {path: "src/main.rs"})
Turn 1: ToolCallResponse(id="2", ok, "<200 lines of code>")
Turn 1: ToolCallRequest(id="3", fs_modify_file, {path: "src/main.rs"})
Turn 1: ToolCallResponse(id="3", ok, "<300 lines of diff>")
Turn 1: ChatResponse::Message("Added error handling to main.")
Turn 2: TurnStart
Turn 2: ChatRequest("now add logging")
Turn 2: ChatResponse::Reasoning("<400 tokens of thinking>")
Turn 2: ToolCallRequest(id="4", fs_modify_file, {path: "src/main.rs"})
Turn 2: ToolCallResponse(id="4", ok, "<250 lines of diff>")
Turn 2: ChatResponse::Message("Added tracing-based logging.")With the default profile (reasoning: Strip, tool_calls: Strip):
Compaction event (after turn 2):
from_turn: 0, to_turn: 2
summary: None
reasoning: Strip
tool_calls: Strip { request: true, response: true }
Projected view:
ChatRequest("set up the project")
ChatResponse::Message("I'll create the project structure.")
ToolCallRequest(id="1", fs_create_file, {[compacted]})
ToolCallResponse(id="1", ok, "[compacted] fs_create_file: success")
ChatResponse::Message("Created src/main.rs with a basic setup.")
ChatRequest("add error handling")
ToolCallRequest(id="2", fs_read_file, {path: "src/main.rs"})
ToolCallResponse(id="2", ok, "[compacted] fs_read_file: success")
ToolCallRequest(id="3", fs_modify_file, {[compacted]})
ToolCallResponse(id="3", ok, "[compacted] fs_modify_file: success")
ChatResponse::Message("Added error handling to main.")
ChatRequest("now add logging")
ToolCallRequest(id="4", fs_modify_file, {[compacted]})
ToolCallResponse(id="4", ok, "[compacted] fs_modify_file: success")
ChatResponse::Message("Added tracing-based logging.")
...turns 3+ uncompacted...Reasoning is stripped, tool arguments and responses are compacted. Note that fs_read_file at id="2" keeps its arguments (per-tool hint request = "keep") while fs_create_file and fs_modify_file have their arguments stripped (per-tool hint request = "strip" because they carry large file content). Messages and conversation structure are preserved.
With the heavy profile (summary: Summarize):
Compaction event (after turn 2):
from_turn: 0, to_turn: 2
summary: SummaryPolicy { summary: "Set up a Rust project at src/main.rs
with error handling and tracing-based logging." }
reasoning: None
tool_calls: None
Projected view:
ChatRequest("[Summary of previous conversation]")
ChatResponse::Message("Set up a Rust project at src/main.rs
with error handling and tracing-based logging.")
...turns 3+ uncompacted...The two profiles show the distinction:
default(mechanical): Conversation structure is preserved. Reasoning is stripped, tool responses are replaced with status lines. Messages and tool call requests remain — the model sees the full flow of what happened, minus the bulk.heavy(summarization): Everything in the range is replaced by a single summary. The summarizer reads ALL raw events (messages, reasoning, tool calls) to produce the summary, so tool usage and decisions are captured in the text. No orphaned events remain.
When summary is set, reasoning and tool_calls are ignored — the summary replaces everything. They only apply when compacting without summarization.
Stacking Semantics
Multiple compaction events compose independently per content type. For each event, per content type, the latest compaction whose range covers that event wins.
Example:
Compaction A (turn 20): from=0, to=20, summary=SummaryPolicy("...")
Compaction B (turn 30): from=0, to=30, tool_calls=Strip { request: false, response: true }| Turn | Event type | A | B | Winner |
|---|---|---|---|---|
| 5 | Any | Summarize | — | A: Summarize |
| 5 | Tool calls | Summarize | Strip | A: Summarize* |
| 25 | Tool calls | out of range | Strip | B: Strip |
| 25 | Reasoning | out of range | — | Keep |
* summary takes precedence over per-type policies when both cover an event.
Summary Overlap Resolution
Summaries are holistic representations of a range — they cannot be split or sliced. Partial overlaps between summary ranges would produce irreconcilable conflicts (two summaries covering partially the same turns, potentially contradicting each other).
Rule: when creating a new compaction with summary: Some(SummaryPolicy), if any existing summary compaction partially overlaps with the new range, the new range is auto-extended to fully subsume the existing one.
Formally: given new range [X, Y] and existing summary range [A, B], if the ranges intersect but neither fully contains the other, extend to [min(X, A), max(Y, B)]. Repeat until no partial overlaps remain. The summarizer then reads raw events for the extended range.
This constraint applies only when summary is set. All other policies operate per-event and compose naturally with partial overlaps.
Raw-Stream Invariant
Summarization always reads the raw (non-compacted) event stream. The summarizer sees the original messages, not prior summaries. This prevents compound information loss — summarizing a summary degrades quality at each step.
When compaction B's range overlaps with compaction A's range, B's summarizer reads the original events for its full range, ignoring A's summary entirely. At projection time, B's summary wins for the overlapping region (it has a later timestamp), and it is a faithful summary of the originals.
This is already guaranteed by the additive design — the raw events are always in events.json — but it is worth stating as an invariant: no code path should feed a projected view to a summarizer.
Strategies
A strategy is a function that analyzes a ConversationStream and produces a Compaction event. Strategies do not mutate the stream.
Mechanical Strategies
These are pure transformations that don't require LLM calls.
strip-reasoning
Produces a compaction with reasoning: Some(ReasoningPolicy::Strip) for the specified range.
Impact: Moderate token reduction for models that emit extended thinking.
strip-tools
Produces a compaction with tool_calls: Some(ToolCallPolicy::Strip { .. }) for the specified range. At projection time, tool response content is replaced with a status line ([compacted] {tool_name}: {success|error}) and/or request arguments are replaced with a compact summary. Which fields are stripped depends on the profile and per-tool hints.
Impact: High. Tool responses and arguments (especially for file-writing tools) dominate token count in coding conversations.
LLM-Assisted Strategies
summarize
Sends the raw events in the specified range to an LLM with instructions to produce a concise summary. Produces a compaction with summary: Some(SummaryPolicy { summary }). When set, this replaces all provider-visible events in the range.
The summarization prompt instructs the model to preserve key decisions, file paths, error resolutions, and the current state of the task. The model and prompt are configurable per-profile (see Configuration).
Impact: High. Replaces an arbitrary number of turns with a short summary.
Configuration
Compaction is configured at the workspace and conversation level, following the same defaults-plus-named-profiles pattern used by tool configuration.
[conversation.compaction]
# The profile to use when --profile is not specified.
default_profile = "default"
# Number of recent turns to preserve (used by profiles that don't
# override it). Shorthand for setting `to` to N turns ago.
keep_last = 3
# Default compaction profile. Applied by `--compact` with no arguments.
[conversation.compaction.profiles.default]
reasoning = "strip"
tool_calls = "strip"
# A heavier profile that includes summarization.
# When summary is set, it replaces all events in the range —
# reasoning and tool_calls policies are not needed.
[conversation.compaction.profiles.heavy.summary]
policy = "summarize"
model = "anthropic/claude-haiku"
# instructions = """
# Summarize this conversation for continuity. Preserve:
# - File paths and code structures discussed
# - Key decisions and their rationale
# - Current task state and next steps
# """
# A minimal profile for quick cleanup.
[conversation.compaction.profiles.light]
reasoning = "strip"Profiles define which per-type policies to apply. The range (from, to, keep_last) comes from the CLI flags or the top-level keep_last default. A profile does not encode a range — ranges are an invocation-time concern.
Conversation-level overrides (via --cfg) can change any of these for a specific conversation.
Per-Tool Compaction Hints
Tools can declare how their calls should be compacted. This is a new optional field in the tool configuration:
[conversation.tools.fs_read_file.compaction]
request = "keep" # "keep" | "strip"
response = "strip" # "keep" | "strip"Per-tool hints override the profile's Strip policy for individual tools. A tool with response = "keep" is exempted from response stripping even under a policy that sets response: true.
Example defaults for the JP project:
| Tool | request | response |
|---|---|---|
fs_read_file | keep | strip |
fs_grep_files | keep | strip |
cargo_check | keep | strip |
cargo_test | keep | strip |
fs_create_file | strip | keep |
fs_modify_file | strip | strip |
git_commit | strip | keep |
These are workspace-level tool configurations, not built-in defaults. Each workspace defines its own tools and compaction hints.
Drawbacks
Summaries are lossy. Even though the original events are preserved, the LLM only sees the compacted view. A poor summary can mislead the model worse than a long conversation. Mitigation: summaries are generated from raw events (never from prior summaries), and the summarization model and prompt are configurable.
Storage growth. Compaction events add to the stream rather than reducing stored size. Summary text in
SummaryPolicycan be non-trivial. In practice this is small compared to the tool responses they overlay, but it is additive rather than reductive.Projection complexity. The projection layer adds a code path between the raw stream and the LLM. Bugs in projection logic could cause the LLM to see inconsistent state. Mitigation: the projection is a pure function of the stream, fully testable without LLM calls.
Prompt cache invalidation. Adding a compaction event changes the projected prefix, invalidating any cached conversation history. System prompt caching is unaffected (it is a separate prefix). This is acceptable for manual compaction but would be problematic for automatic compaction.
--dry-runcannot preview summaries. For mechanical strategies, dry-run accurately shows the projected view. For summarization, dry-run can only report "will generate a summary for turns X-Y using model Z" — it cannot show the actual summary without spending tokens on an LLM call, and re-running without--dry-runwould produce a different summary anyway.
Alternatives
Destructive compaction (RFD 036)
The original design: strategies mutate the ConversationStream directly, with fork-by-default as a safety net. Rejected because:
- Information loss. Once events are deleted, they're gone. Fork mitigates but doesn't solve — you end up with two conversations, one intact and one damaged.
- No undo. Reverting a compaction requires restoring from the fork.
- Fork proliferation. Each compaction creates a new conversation, cluttering the conversation list.
- Conflated concerns. Destructive compaction mixes "what to send to the LLM" (a view concern) with "what to store on disk" (a persistence concern).
Automatic compaction on every turn
Compact transparently when approaching the context window limit. Rejected for this RFD: compaction is lossy and should be an explicit user decision. Automatic compaction has additional design constraints (caching interaction, interval control, trigger conditions) that warrant a separate proposal.
Single monolithic compact operation
One "compact" that does everything. Rejected: different conversations need different compaction. A coding conversation benefits from tool response stripping; a discussion benefits from summarization. Named profiles with per-type policies let users tailor the operation.
Non-Goals
Automatic compaction. This RFD covers explicit, user-initiated compaction. Automatic compaction (triggered by context window proximity or turn count thresholds) has different design constraints — caching interaction, trigger intervals, rolling window semantics — and is deferred to a follow-up RFD. The config namespace (
conversation.compaction.auto) is reserved.Provider-delegated compaction. Some providers offer server-side compaction (Anthropic returns readable summaries, OpenAI returns opaque encrypted blobs). In practice, readable provider summaries offer no advantage over JP's own
SummaryPolicyusing the same model, and opaque formats cannot be integrated into JP's event model. Provider delegation may become interesting if providers offer compaction capabilities that can't be replicated client-side, but that's not the case today.Custom external strategies. An extension point where an external binary receives the raw events and range, and returns replacement events that JP stores in the compaction event. This is analogous to how tools work today (external process, structured I/O) and would enable domain-specific compaction logic. The compaction event model supports this (replacement events are just the policy payloads), but the protocol and CLI integration are deferred.
Tool subsumption protocol. RFD 036 proposed an
Action::Subsumestool protocol extension where tools could declare that one call subsumes another (e.g.,read_file(1,10)subsumesread_file(2,5)). This is a refinement that can be added later without changing the compaction event model.Interactive tangent classification. RFD 036 proposed a
classify-tangentsstrategy that uses an LLM to identify off-topic turns. Interesting but orthogonal to the core compaction model.Tool call deduplication. Identifying and removing duplicate tool calls (same name, same arguments) across turns. While potentially useful, it adds complexity to the compaction model (per-call selective policies) for marginal benefit. Can be added as a
ToolCallPolicy::Selectivevariant later if needed.Conversation merging. Combining two conversations into one.
Interactive stream editing. A
$EDITOR-based workflow for manually removing or reordering events in the raw stream (similar togit rebase -i). This is a separate, destructive operation on the stored events — distinct from compaction's non-destructive overlay model — and warrants its own RFD.
Risks and Open Questions
Summarization prompt design. The summary needs to preserve the right context — key decisions, file paths, error resolutions, task state. What should the prompt look like? This needs experimentation during implementation. We should take inspiration from Anthropic's default summarization prompt.
Turn boundary correctness. Range resolution must handle edge cases: conversations with only 1 turn, turns with no tool calls, interrupted turns. The existing
fork --lastimplementation is a reference.Config delta preservation.
ConversationStreaminterleavesConfigDeltaevents with conversation events. The projection layer must preserve config deltas correctly — compacting a range should not affect config state for events outside that range.Summary injection and provider expectations. The synthetic
ChatRequest/ChatResponsepair injected for summaries must maintain the user/assistant alternation that providers expect. Needs testing across Anthropic, OpenAI, Google, and local providers.
Implementation Plan
Phase 1: Compaction Event Model
- Add
InternalEvent::Compaction(Compaction)tojp_conversation. - Define the
Compactionstruct, per-type policy enums, and serialization. - Update
ConversationStreamto handle the new variant:is_empty(),len(),retain(),sanitize()should treat compaction events like config deltas (preserved, not counted). - Add unit tests for serialization roundtrip and stream invariants.
Can be merged independently. No behavioral changes.
Phase 2: Projection Layer
- Add
ConversationStream::projected_iter()that applies compaction overlays to yield the projected view. - Implement the stacking semantics (latest-wins per content type).
- Implement summary injection (synthetic
ChatRequest/ChatResponsepair). - Wire
Thread::into_parts()to useprojected_iter(). - Add unit tests for each policy type, stacking, and summary overlap auto-extension.
Depends on Phase 1. After this phase, compaction events in the stream will affect what the LLM sees.
Phase 3: Mechanical Strategies and CLI
- Implement strategy functions:
strip_reasoning,strip_tools. Each produces aCompactionevent. - Implement range bound resolution (negative integers, duration strings,
last→ absolute turn index). - Add the
jp conversation compactCLI command with--profile,--from,--to,--keep-last,--dry-run. - Add
--compact[=profile]tojp conversation fork. - Add
--compactedtojp conversation print. - Add integration tests.
Depends on Phase 2.
Phase 4: Configuration
- Add
conversation.compactionconfig section withdefault_profile,keep_last. - Add
conversation.compaction.profilessupport (named policy sets). - Add per-tool
compactionhints toToolConfig. - Wire profiles into the CLI (
--profileflag,--compactdefaults). - Add config tests.
Depends on Phase 3. Can be partially parallelized with Phase 3 (config types can be defined before the CLI is wired up).
Phase 5: LLM-Assisted Summarization
- Implement the
summarizestrategy: read raw events, call the configured model, produceSummaryPolicy { summary }. - Implement the summary overlap auto-extension logic.
- Add
--compact[=profile]tojp query. - Add integration tests (with mock LLM).
Depends on Phase 2. Can proceed in parallel with Phases 3 and 4.
References
- RFD 011 — System Notification Queue (compaction interaction)
- RFD 034 — Inquiry-Specific Assistant Configuration (defers compaction)
- RFD 036 — Conversation Compaction (superseded by this RFD)
- Issue #57 — Make conversation management more powerful
- Multi-turn degradation paper — cited in Issue #57