RFD 053: Auto-Refresh Conversation Titles
- Status: Discussion
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2026-03-18
- Requires: RFD 020, RFD 069, RFD 073
Summary
Conversation titles are generated once after the first turn and never automatically updated. This RFD adds periodic refresh: when conversation.title.generate.auto_refresh.turn_interval is set to a positive integer N (default 5), the least recently activated conversations that have accumulated N new turns since their titles were last generated are re-titled as background tasks on the next jp query run.
Motivation
A title is generated on the first turn of a new conversation and then frozen. This works well for short, focused conversations, but longer ones take unexpected turns and end up with a title that describes only the opening exchange. The user is left with a list of conversations whose titles no longer reflect what they contain.
The user can already run conversation edit --title to manually regenerate a title, but this requires noticing the problem and taking action. Periodic automatic refresh should be transparent.
The fix needs to be careful about cost. Triggering a new LLM request on every jp query invocation for every stale conversation in the workspace would spike API usage and potentially delay the CLI on exit. This design processes a bounded number of conversations per run in the background — the same pattern already used for initial title generation.
Design
Configuration
A new auto_refresh sub-table is added to conversation.title.generate:
[conversation.title.generate]
auto = true
model = ...
[conversation.title.generate.auto_refresh]
turn_interval = 5 # refresh every N turns; 0 = disabled (default = 5)
batch_size = 1 # max conversations to refresh per run; or "all" (default = 1)
turn_context = 10 # max turns sent to LLM for re-titling; or false for unlimited (default = 10)turn_interval = 0 disables the feature entirely.
batch_size controls how many stale conversations are refreshed per jp query invocation. The default of 1 spreads the work across runs. Setting it to "all" refreshes every stale conversation in a single run — useful for catching up after enabling the feature on a workspace with many long-running conversations, at the cost of more LLM requests.
turn_context limits how many recent turns are sent to the LLM when re-generating a title. For long conversations, earlier turns are often irrelevant to what the conversation is currently about. The default of 10 keeps costs predictable and focuses the title on recent activity. Setting it to false disables the limit and sends the full conversation.
These fields map to a new nested AutoRefreshConfig on GenerateConfig:
pub struct AutoRefreshConfig {
pub turn_interval: usize, // default 5
pub batch_size: BatchSize, // default Count(1)
pub turn_context: Option<usize>, // default Some(10), None = unlimited
}
enum BatchSize {
Count(usize),
All,
}Using 0 as a sentinel for "unlimited" would clash with turn_interval's 0-as-disabled meaning — same struct, same sentinel, opposite direction. Option<usize> keeps the natural reading: Some(n) means "send up to n turns," None means "send all of them."
The TOML/JSON surface accepts the boolean false for the unlimited case. KvAssignment::try_some_u32 rejects boolean input, so a small try_some_u32_or_false helper is added to jp_config::assignment, mirroring the existing try_some_bool_or_from_str pattern: a non-negative integer maps to Some(n), the boolean false maps to None, and anything else (true, a string, a negative integer) is rejected with a clear error.
When conversation.title.generate.auto = false, auto-refresh is also disabled regardless of turn_interval.
Persisted state: title_generated_at_turn
To determine staleness, the system needs to know how many turns existed when the title was last generated. A new optional field is added to Conversation in metadata.json:
/// Turn count when the title was last auto-generated.
///
/// - `None` — legacy conversation; treated as baseline 0 (auto-enrolls in
/// refresh once it has accumulated `turn_interval` turns).
/// - `Some(n)` — title was auto-generated when the conversation had `n` turns.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub title_generated_at_turn: Option<usize>,This field is a pure watermark, not configuration. It lives in metadata.json alongside title because they are tightly coupled — one is the output, the other is the checkpoint that governs when it is regenerated.
| Watermark | Title | Meaning | Eligible? |
|---|---|---|---|
None | Some(_) | Legacy conversation, has a title | No (treated as manual) |
None | None | Legacy conversation, no title | Yes, baseline 0 |
Some(n) | (any) | Auto-generated/evaluated at turn n | Yes, baseline n |
Legacy conversations
Existing conversations created before this RFD have title_generated_at_turn = None. There is no provenance field that distinguishes a hand-titled conversation from one whose title was generated automatically. The migration policy is conservative:
title_generated_at_turn = Noneandtitle = Some(_): treated as manual and skipped by auto-refresh. The user (or a previous automatic generation) produced this title; without provenance the safe default is to leave it alone. A conversation enrolls into auto-refresh only after an explicitconversation edit --title(no argument) records a watermark.title_generated_at_turn = Noneandtitle = None: eligible. There is no title to overwrite, so auto-refresh proceeds with baseline0.
This trades a one-time "stale-but-not-refreshed" state for safety — hand-titled conversations are preserved without guessing.
Interaction with manual title surfaces
Several CLI paths set or clear metadata.title directly without going through the LLM. The user's intent in those cases is "this is the title I want" — the system should not later overwrite it via auto-refresh.
The rule: every path that explicitly sets or clears metadata.title (without LLM generation) also writes a ConfigDelta event with conversation.title.generate.auto_refresh.turn_interval = 0, disabling auto-refresh for that conversation. To re-enable later, the user can run config set conversation.title.generate.auto_refresh.turn_interval 5 (or any positive value) to write a new ConfigDelta that re-enrolls the conversation.
The affected surfaces:
| Surface | Behavior |
|---|---|
conversation edit --title (no arg) | Regenerates via LLM. Sets |
title_generated_at_turn = Some(turn). | |
| No disable-delta — the user opted into | |
| LLM-driven titling. | |
conversation edit --title "T" | Sets title = Some("T"), writes |
| disable-delta. | |
conversation edit --no-title | Clears title, writes disable-delta. |
query --title "T" | Sets title = Some("T"), writes |
| disable-delta. | |
query --no-title | Clears title, writes disable-delta. |
conversation fork --title "T" | Sets title = Some("T") on the fork, |
| writes disable-delta on the fork. |
The disable-delta is a single small helper invoked from each call site rather than scattered logic.
Watermark invariants under stream changes
title_generated_at_turn is a position into the event stream. Any operation that changes the position of turn_start events relative to the watermark must update the watermark — otherwise it can drift past turn_count, leaving the conversation permanently ineligible for refresh.
Concretely, this affects forks that retain a tail of the stream:
conversation fork --last Ncallsevents.retain_last_turns(N)on the fork's events. The fork inherits the source's metadata via clone, so a source withtitle_generated_at_turn = Some(k)andk > Nproduces a fork whose watermark exceeds its own turn count.conversation fork --from/--untilsimilarly drops events.
The rule: any operation that drops turn_start events from a stream clamps the resulting watermark to min(watermark, new_turn_count). For the common fork --last 1 (a one-turn snapshot), this lands the watermark at 1, matching the semantics of the first-turn auto-generation that runs on a fresh conversation.
A conversation fork --title "..." invocation writes a disable-delta on the fork (per Interaction with manual title surfaces), so the watermark on the fork is irrelevant to refresh decisions.
ConversationStream::retain_last_turns is the only stream-shortening operation in the codebase today. RFD 064 (non-destructive compaction) does not affect the watermark — compaction events are appended overlays and do not change the underlying turn_start count.
Background task
A new TitleRefreshTask runs the full refresh pipeline in the background: candidate scanning, turn counting, stream loading, and LLM calls. The main thread's only responsibility is spawning the task — all heavy I/O happens off the critical path.
Spawn (main thread)
At the start of a jp query invocation, after the workspace is loaded, a single TitleRefreshTask is spawned when turn_interval > 0 && auto && ctx.term.args.persist. The persist gate matches the existing first-turn title spawn at crates/jp_cli/src/cmd/query.rs — under --no-persist, writes are no-ops through NullPersistBackend but the LLM call would still cost money, so auto-refresh is unconditionally suppressed in that mode.
The task receives:
- An
Arc<dyn LoadBackend>cloned from the workspace, used for read-only scanning. After RFD 073,LoadBackendis the public trait for reading conversation IDs, metadata, and event streams; the underlyingStoragestruct is a private implementation detail ofjp_storageand is not exposed to callers. The task usesLoadBackend::load_conversation_ids,LoadBackend::load_conversation_metadata, andLoadBackend::load_conversation_stream. - An
Arc<dyn LockBackend>cloned from the workspace, used for the preflight lock check before each LLM call. - The active conversation ID (to exclude it from candidacy).
- The
AutoRefreshConfigfrom configuration. - Provider and model configuration for the LLM call.
No conversation scanning or event file reading happens on the main thread.
This aligns with RFD 074's direction for a fallible escape-hatch API for background tasks, but does not depend on it — the trait methods used here exist today.
Run (background)
The task performs the following steps inside run():
List all conversation IDs via
LoadBackend::load_conversation_ids.For each conversation, load metadata via
LoadBackend::load_conversation_metadatato gettitle,title_generated_at_turn,last_activated_at, andturn_count.load_conversation_metadataalready calls the lightweightload_count_and_timestamp_eventsto populateevents_countandlast_event_atfromevents.json. This function is extended to also deserialize thetypefield and countturn_startevents, populating a newturn_countfield onConversation. The extension adds one field to the internalRawEventstruct:rust#[derive(serde::Deserialize)] struct RawEvent { timestamp: Box<serde_json::value::RawValue>, #[serde(rename = "type")] event_type: Box<serde_json::value::RawValue>, }A
turn_startevent is counted whenevent_type.get()equals"\"turn_start\"". This avoids a second pass overevents.json.Skip the active conversation (it is being actively worked on).
Compute eligibility:
title_generated_at_turn = Some(n): stale whenturn_count >= n + turn_interval.title_generated_at_turn = Noneandtitle = None: eligible with baseline0— stale whenturn_count >= turn_interval.title_generated_at_turn = Noneandtitle = Some(_): skipped (legacy manual title; see Legacy conversations).
Sort stale conversations by
last_activated_atascending (least recently active first).Take up to
batch_sizecandidates.For each candidate, in order:
- Check
CancellationToken. If cancelled, return immediately with the results accumulated so far (see Cancellation below). - Preflight the conversation lock via
LockBackend::lock_info(id). If another session holds the lock, the conversation is being actively written — skip the candidate to avoid spending an LLM call on work that will be discarded at sync time. - Load the full
ConversationStreamviaLoadBackend::load_conversation_stream. Inspect the conversation's merged config viastream.config()— ifturn_intervalhas been overridden to0via aConfigDelta(e.g., byconversation edit --title "..."or any other manual title surface), skip this candidate. - Scope the stream to the last
turn_contextturns (ifSome(n)) and run the LLM title generation call wrapped inselect!againsttoken.cancelled(). On cancellation, abort the in-flight LLM call; on success, store the result onself.
- Check
If a file read fails (e.g., partially written by a concurrent session), the task logs a warning and moves to the next candidate.
The preflight lock check is an optimization, not a correctness mechanism — a concurrent session can still acquire the lock between the preflight and the LLM completion. Correctness is provided by the sync-phase try_lock (see Sync (main thread)).
Cancellation
The Task trait contract is that run() returns Box<dyn Task> (the same task), and TaskHandler::sync is then called on the returned task. If the task does not return promptly after cancellation, TaskHandler forces shutdown via JoinSet::shutdown() and sync() is never called — dropping any accumulated results.
To preserve completed candidates across cancellation:
- Each LLM call is wrapped in a
select!againsttoken.cancelled(). - Per-candidate results are stored on
selfimmediately after each LLM call completes. - On cancellation, the loop returns
Ok(self)with whatever has been collected; the in-flight candidate is discarded.
This keeps cancellation latency bounded by the per-iteration check rather than by the full LLM round-trip, so TaskHandler's 2-second forced-shutdown window is comfortable even with batch_size = "all".
Title retention schema
To avoid unnecessary title churn, the title generation schema is extended with a retain_current field. The LLM receives the current title in its prompt and can indicate that it is still adequate:
{
"retain_current": false,
"titles": [
"New title suggestion"
]
}The prompt includes:
The conversation currently has the title: "{current_title}". If this title still accurately describes the conversation, set
retain_currenttotrue. Only generate new titles if the conversation has meaningfully changed direction.
When retain_current is true, the task advances the title_generated_at_turn checkpoint (recording that the title was evaluated) but leaves title unchanged. This prevents the same conversation from being re-evaluated on every run while keeping its perfectly good title.
The title_schema and title_instructions helpers in jp_llm::title are shared by initial generation (TitleGeneratorTask), interactive regeneration (conversation edit --title), and the new refresh path. Adding retain_current unconditionally would let the LLM respond with "keep current" to the interactive regeneration path — which is the opposite of what the user asked for.
Both helpers are therefore parameterized by a mode (TitleMode::{Initial, Regenerate, Refresh}). Only Refresh includes the current title in the instructions and the retain_current field in the schema. Initial and Regenerate keep their current behavior unchanged.
Context window safety
The turn_context setting (default 10) provides the first line of defense against oversized requests: only the most recent N turns are sent to the LLM. This scoping happens before any token-level checks.
The title generation model may still have a smaller context window than those N turns require. The inquiry system already solves this problem: it estimates char-based token counts and drops older events to fit the model's context window.
The core truncation logic — estimate chars, compare to budget, drop oldest events, re-sanitize — is extracted from jp_cli::cmd::query::tool::inquiry into a shared utility (in jp_llm or jp_conversation) that both the inquiry backend and the title generator can use. Each caller computes its own overhead (the inquiry system accounts for tools, attachments, and cache-preserving granularity; the title generator only needs system prompt and title instructions).
The pipeline for each candidate is: scope to last turn_context turns > estimate chars > truncate if over budget > send to LLM.
Sync (main thread)
sync runs after the background phase completes (or after cancellation returns the task with accumulated results) and is given &mut Workspace. For each successfully evaluated candidate:
Workspace::acquire_conversation(id)to obtain a handle.Workspace::lock_conversation(handle, None)— a non-blockingtry_lockthat returnsLockResult::AlreadyLockedon contention. If the lock is held, log and skip; another session is currently writing.- Update
conversation.title(unlessretain_currentwastrue). - Set
conversation.title_generated_at_turn = Some(turn_count_at_evaluation), whereturn_count_at_evaluationis the turn count observed by the background task when it read the conversation's events.
Using the count at evaluation time rather than at sync time means the checkpoint advances by what was true when the decision was made, not by any turns added during the current session.
ConversationMut::Drop (per RFD 069) flushes the metadata change while the flock is still held, so the data reaches disk inside the lock window.
If a candidate fails (LLM error, parse failure), the task logs a warning and skips it. Successful candidates are still synced.
Interaction with conversation locks (RFD 020)
RFD 020 is implemented; conversation writes are protected by exclusive file locks. The title refresh task interacts with locks at three points:
Preflight (run): Before each LLM call, the task calls LockBackend::lock_info(id). A held lock means another session is mid-write — the candidate is skipped without paying for an LLM call. This is an optimization; correctness lives in the sync phase below.
Read (run): Metadata and event reads go through LoadBackend without acquiring a lock. If a concurrent session is mid-write and the file is partially serialized, the JSON parse fails and the task moves on.
Write (sync): Workspace::lock_conversation performs a non-blocking try_lock. On LockResult::AlreadyLocked, the title update is discarded; the conversation will be retried on the next eligible run. On LockResult::Acquired, the metadata change is written through ConversationMut, which auto-persists on drop while the lock is still held.
This approach avoids blocking CLI exit on lock contention and naturally handles the common case: stale conversations are by definition idle, so lock contention on them is rare.
Spawn location
The TitleRefreshTask is spawned in query.rs, alongside the existing first-turn title spawn. This restricts title refresh to jp query — the only command with a meaningful conversation lifetime and where an LLM call is already expected. Short-lived commands (conversation ls, conversation edit, etc.) do not trigger it.
The spawn condition is turn_interval > 0 && auto && ctx.term.args.persist. The persist gate matches the existing first-turn spawn at crates/jp_cli/src/cmd/query.rs. Under --no-persist, writes are no-ops via NullPersistBackend but an LLM call would still incur cost — auto-refresh is unconditionally suppressed in that mode.
The existing first-turn title spawn in query.rs is updated to set title_generated_at_turn = Some(1) once the title write completes, so all new conversations have a baseline and become eligible for future auto-refresh.
Drawbacks
Each jp query run spawns a background task that loads metadata for every conversation via LoadBackend. Since load_conversation_metadata already reads events.json (for events_count and last_event_at), the turn counting extension adds no extra file reads — it piggybacks on the existing lightweight parse. For workspaces with hundreds of conversations this is nonzero I/O, though it happens entirely in the background and does not delay the user's query.
The retain_current schema adds a small amount of complexity to the title generation prompt and response handling. Models may occasionally set retain_current = false and produce a title that is semantically identical to the original, causing cosmetic churn. This is a minor nuisance, not a correctness issue.
Alternatives
Timestamp-based staleness. Track when the title was generated and refresh if enough time has elapsed. Rejected: time is a weaker signal than turns. A conversation that receives one turn per day and one that receives twenty turns per hour have the same time-based staleness but very different content drift.
Use events_count as a proxy for turns. Already computed and readily available. Rejected: it's imprecise. A single turn with heavy tool use generates many events; the threshold would behave inconsistently across different usage patterns. Turn count is the right unit.
Cache turn_count in metadata.json. Avoids reading events.json during candidacy checks. Rejected: this introduces derived state from events.json into metadata.json, breaking the convention that all conversation-level behavioral state flows through the event stream's ConfigDelta. The background task architecture makes this optimization unnecessary — the I/O happens off the critical path.
Scan on the main thread, load streams in the background. Perform candidate selection synchronously and only push the LLM call to the background. Rejected: candidate scanning requires reading metadata.json for every conversation and events.json for stale candidates. This forces eager loading of all conversation metadata on the main thread, changing jp query startup from O(1) disk reads (active conversation only) to O(N). Moving the entire pipeline to the background keeps startup cost at O(1).
Non-Goals
This RFD does not change when or how the initial title is generated. The first-turn behavior is unchanged except for setting title_generated_at_turn.
It does not add any user-visible indication that a title was refreshed in the background.
Risks and Open Questions
Concurrent CLI runs. Two simultaneous jp query invocations could both spawn a TitleRefreshTask that selects the same stale conversation. The result is two LLM requests producing the same (or a slightly different) title — no data corruption, just a wasted request. The sync-phase locking (RFD 020) prevents concurrent metadata writes; the second task's try_lock fails and the update is discarded.
Token cost of re-titling long conversations. The turn_context default of 10 bounds the typical cost, but users who set turn_context = false (unlimited) or have very long individual turns may still send large payloads. The context window truncation utility provides a hard safety net, but the cost scales with the retained context size. Worth monitoring once the feature ships.
Title quality on truncated context. Both turn_context scoping and context window truncation mean the LLM sees only a suffix of the conversation. The generated title will reflect recent activity rather than the full arc. This is an acceptable trade-off — recent activity is usually more relevant to what the user is currently working on — but users should be aware that titles may shift focus as the conversation evolves.
Implementation Plan
Phase 0: Shared truncation utility (independent)
- Extract the core truncation logic (estimate chars, compare to budget, drop oldest events, re-sanitize) from
jp_cli::cmd::query::tool::inquiryinto a shared utility injp_llmorjp_conversation. - Update the inquiry backend to use the shared utility.
- Update
TitleGeneratorTask::update_titleto truncate the event stream when the title model's context window is smaller than the conversation.
Phase 1: Configuration (independent)
- Add a
try_some_u32_or_falsehelper tojp_config::assignment, mirroring the existingtry_some_bool_or_from_strpattern: integer →Some(n), booleanfalse→None, anything else → error. - Add
AutoRefreshConfig(withturn_interval: usize,batch_size: BatchSize,turn_context: Option<usize>) as a nested config onGenerateConfiginjp_config. - Wire through
AssignKeyValue(using the new helper forturn_context),PartialConfigDelta, andToPartialimpls.
Phase 2: State (independent)
- Add
title_generated_at_turn: Option<usize>toConversationinjp_conversation. - Add
turn_count: usize(computed,#[serde(skip)]) toConversation. - Extend
load_count_and_timestamp_eventsinjp_storageto countturn_startevents and populateturn_count. - Make
ConversationStream::retain_last_turns(and any other stream-shortening operations) clamp the conversation'stitle_generated_at_turntomin(current, new_turn_count).
Phase 3: Manual-title disable-deltas (depends on Phase 1)
- Add a small helper that writes
ConfigDelta(auto_refresh.turn_interval = 0)to a conversation's event stream. - Apply the helper from every manual title surface:
conversation edit --title "..."(user-provided)conversation edit --no-titlequery --title "..."query --no-titleconversation fork --title "..."
- Update
conversation edit --title(no argument) to settitle_generated_at_turn = Some(current_turn_count)after LLM generation.
Phase 4: Title retention schema (independent)
- Add
TitleMode::{Initial, Regenerate, Refresh}tojp_llm::titleand parameterizetitle_schemaandtitle_instructionson the mode. - Only
Refreshadds theretain_currentfield to the schema and the current-title context to the prompt. - Add a companion function (or extend
extract_titles) that returns theretain_currentflag alongside the title list.
Phase 5: Task and spawn (depends on Phase 0, 1, 2, 3, 4)
- Implement
TitleRefreshTaskwith the full background pipeline: scan conversation IDs viaLoadBackend, read metadata and turn counts, sort candidates bylast_activated_at, preflight withLockBackend::lock_info, run the LLM call inside aselect!againsttoken.cancelled(), and accumulate results intoself. - Implement
syncusingWorkspace::acquire_conversation+lock_conversationConversationMut. Skip onLockResult::AlreadyLocked.
- Update the existing first-turn title spawn in
query.rsto settitle_generated_at_turn = Some(1)after the title write completes. - Spawn
TitleRefreshTaskinquery.rswhenturn_interval > 0 && auto && ctx.term.args.persist.
Phase 6: Tests
Coverage for the high-risk paths, paired with the phases that introduce them:
- Legacy custom title (
title_generated_at_turn = None,title = Some(_)) is not auto-refreshed. - Legacy untitled conversation (
title_generated_at_turn = None,title = None) is auto-refreshed with baseline0. query --title,query --no-title,conversation edit --title "...",conversation edit --no-title, andconversation fork --title "..."each write the disable-delta and prevent future refresh.conversation fork --last Nclampstitle_generated_at_turnto the new turn count.--no-persistdoes not spawnTitleRefreshTask.- Cancellation after one completed candidate still syncs that result; an in-flight candidate is discarded.
- A locked candidate is skipped at preflight without an LLM call.
retain_current = trueadvances the watermark without changingtitle.
Phases 0, 1, 2, and 4 can be reviewed and merged independently. Phase 3 depends on Phase 1. Phase 5 depends on all earlier phases. Phase 6 (tests) is paired with each phase as it lands.