RFD D27: Granular Recovery for Stored Conversation Configs
- Status: Draft
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2026-05-15
Summary
Make the stored-conversation config compat layer recover from schema incompatibilities at field granularity rather than wiping the entire config. A single invalid field no longer locks the user out of the conversation. Holes left by recovery are filled from the workspace's currently-resolved config, guaranteeing that finalize succeeds.
Motivation
When the AppConfig schema evolves between JP versions (a field's type changes, an enum variant is renamed, a validator is tightened), conversations persisted by the old version may store values that the new version can no longer deserialize. The current compat layer in jp_conversation::compat::deserialize_partial_config has two recovery paths:
- Unknown fields (field removed or renamed) — granular. The schema-walk in
strip_unknown_fieldsremoves individual keys; surrounding config survives. - Type mismatches (field's type changed) — all-or-nothing. The whole
PartialAppConfigis replaced withempty().
The second path is responsible for the lock-out class of bug. A concrete instance shipped recently: changing DelayDuration from a Duration-backed newtype (serialized as {secs, nanos}) to a humantime string broke any conversation whose base_config.json or ConfigDelta referenced style.typewriter.*. The wipe cascades into a StreamError::Config from AppConfig::from_partial_with_defaults, because AppConfig has no Default impl and requires assistant.model.id to be present. The user sees Not found / Conversation events from every command that loads the stream (jp c show, jp c print, jp query).
We don't want to anticipate which field will break next. We want stored conversation configs to tolerate any future schema change, surfacing a warning rather than locking the user out.
Design
Two independent pieces stack to provide full recovery: a granular per-field drop pass on the stored partial, and a workspace-config fill pass that guarantees finalize succeeds.
Scope
The tolerant path applies only to data deserialized via jp_conversation::compat::deserialize_partial_config:
base_config.json(viaConversationStream::from_parts)- The
deltasubtree of eachConfigDeltaevent inevents.json(viadeserialize_config_delta)
User-authored configuration sources — config.toml, .jp.toml, --cfg flags, environment variables — are loaded through jp_config::util::load_partial_at_path and continue to error strictly on any invalid field. A typo in the user's config is still a hard failure.
Recovery, end-user view
$ jp c show
WARN Dropped incompatible config field path=style.typewriter.text_delay error=invalid type: map, expected a string
WARN Filled recovered fields from workspace config path=style.typewriter.text_delay
target Conversation
id jp-c17788309593
title Investigate compat layer
...The conversation loads, the bad field is dropped, the resulting hole is filled with the user's current workspace value for that field, and execution continues.
Step 1: granular per-field drop
Replace the single from_value attempt in deserialize_partial_config with a retry loop that uses serde_path_to_error to identify and drop the offending field on each failure.
pub fn deserialize_partial_config(mut value: Value) -> PartialAppConfig {
let schema = AppConfig::schema();
let stripped = strip_unknown_fields(&mut value, &schema);
if stripped > 0 {
warn!(count = stripped, "Stripped unknown fields from stored config.");
}
loop {
let de = serde_path_to_error::deserialize::<_, PartialAppConfig>(value.clone());
match de {
Ok(config) => return config,
Err(err) => {
let path = err.path().to_string();
if path.is_empty() || !remove_at_path(&mut value, &path) {
warn!(
error = %err.inner(),
"Stored config cannot be recovered, replacing with empty config.",
);
return PartialAppConfig::empty();
}
warn!(
%path,
error = %err.inner(),
"Dropped incompatible config field.",
);
}
}
}
}remove_at_path is a small helper that walks dotted paths with bracket-index segments for arrays (providers.llm.aliases[2].name and the like). It returns false if the path cannot be navigated (root, or a missing intermediate), at which point we fall through to the existing empty fallback.
This loop terminates: each iteration either succeeds or removes one field from the value. The value shrinks monotonically.
Step 2: fill from the workspace partial
After Step 1 the partial is well-typed but may have holes — fields the user had set that we had to drop. Some of those fields are required by AppConfig and have no #[setting(default)], so finalize would still fail.
The fix is to fill those holes from the workspace's pre-conversation partial. The workspace partial (pipeline.partial_without_conversation() in jp_cli) is the merge of config.toml, environment variables, and --cfg flags — everything except per-conversation overlays. By invariant it must finalize to a valid AppConfig, because jp itself wouldn't be running otherwise.
Merge order, weakest to strongest:
- Schematic defaults (
PartialAppConfig::default_values). - Workspace partial (
fallback_partial). - Recovered stored partial (output of Step 1).
PartialConfig::fill_from consumes from other only where self is None, so the natural expression is:
let effective = stored_partial
.fill_from(fallback_partial.as_ref().clone())
.fill_from(default_values);
AppConfig::from_partial(effective, vec![])Finalize cannot fail: any required-without-defaults field that was in the stored partial is either still there (Step 1 didn't drop it) or has been refilled from the workspace partial.
Plumbing
Two concrete changes outside compat.rs:
Workspacegains an optionalfallback_partial: Option<Arc<PartialAppConfig>>field with a setter.jp_cli::run_innersets it frompipeline.partial_without_conversation()after Phase 1 of config loading and before any conversation contents are accessed.ConversationStream::from_partsis split into two halves:- A raw build that produces a stream with the recovered stored partial (post-Step-1) attached, no finalize.
- A finalize step that takes a fallback partial and resolves the partial to an
AppConfig.
The lazy loader in
Workspace::events()/metadata()runs the raw build via the existingLoadBackend::load_conversation_stream, then finalizes withself.fallback_partial.
Once the stream is finalized its base_config: Arc<AppConfig> is treated identically to any other loaded stream. Persistence (to_parts) is unchanged.
Why persisting the patched config is correct
The field that was dropped is, by construction, unreadable. We have no representation of the user's original value to compare to or preserve. Substituting the workspace value is the best information available; writing it back on the next save is consistent with that substitution. The alternative — refuse to persist base_config.json after recovery — would leave the unreadable bytes on disk so every subsequent load runs the same recovery, warns again, and depends on the workspace state of the moment. That is strictly worse.
Drawbacks
- Repeated deserialization. Worst-case the retry loop runs once per dropped field. In practice the number of broken fields per stored config is small (one, in the motivating bug). Negligible cost.
- The fallback uses
pipeline.partial_without_conversation(), which includes--cfgflags. A user who runs with--cfg experimentalwhile recovery fires will bake the experimental layer into the conversation's storedbase_config.jsonon next save. See Risks and Open Questions. - Stored values get harder to inspect once recovered, because the in-memory representation no longer matches what's on disk until the next persist.
Alternatives
- Per-top-level-field deserialize. Try
from_valueon each top-level field ofPartialAppConfigindependently; drop the whole field on failure. Generic but coarse — a badstyle.typewriter.text_delaywould wipe all ofstyle. The retry loop is finer at comparable cost. - Schema-aware leaf type pre-check. Extend
strip_unknown_fieldsto also drop leaves whose JSON type doesn't match the schematicSchemaType. Catches the "primitive type changed" subclass cheaply but doesn't help with semantic errors (unknown enum variant strings, out-of-range integers, validator failures). Useful as a complement but not as the only recovery mechanism. Deferred. - Versioned configs with explicit migrations. Stamp
base_config.jsonand eachConfigDeltawith a schema version; register forward migrations. Explicit and auditable but turns every breaking change into a deliberate migration step. Right answer eventually, wrong answer for the project's current iteration speed. Deferred. - Fall back to
AppConfig::default()on finalize failure. There is noDefaultimpl onAppConfigbecause some fields (model id) require user configuration. Even if one existed, defaults wouldn't reflect the user's chosen models, providers, or aliases — the workspace partial does. - Replace the entire stored config with the workspace partial on any error. Discards user intent for fields that were recoverable. The retry loop preserves the parts that still parse.
Non-Goals
- No change to user config file loading.
config.toml,.jp.toml, environment variables, and--cfgflags continue to error strictly. A typo in user-authored config is still a hard failure. - No interactive recovery prompt. A future RFD may introduce a flow like "this field changed since the conversation was saved — patch this one, patch all stored configs, or exit." That feature has its own design surface (where the prompt fires, semantics of "patch all" against the append-only event log, non-interactive invocations, multi-process locking) and is deferred.
- No reverse compatibility. An older JP reading a newer
base_config.jsonis out of scope. This RFD covers newer JP reading older stored data. - No schema versioning infrastructure. See Alternatives.
Risks and Open Questions
--cfgleakage into stored configs. The fallback partial includes--cfgflags. If recovery fires during a run with a transient--cfg, those values become durable conversation state on the next save. A stricter fallback (files + env only, no--cfg) would avoid this at the cost of an additional split in the config pipeline. Ship the simpler version first; revisit if it bites.- Warning fatigue. A widely-shipped schema change can produce one warning per affected conversation per load. Mitigated by the fact that the next save flushes the recovered config back to disk, removing the warning on subsequent loads.
remove_at_pathcorrectness. The helper has to handle dotted keys and bracket-indexed array elements as produced byserde_path_to_error::Path. Test coverage needs to include nested structs, arrays of structs, and the root case.- Diagnosability of corrupted stored data. The current
maybe_init_eventswarning drops the underlying error. We should also surface the actual load error (error = %err) so users can tell recovery from genuine corruption. Small fix, included in the plan.
Implementation Plan
serde_path_to_errorretry loop injp_conversation::compat. Add the crate as a direct dependency ofjp_conversation. Implementremove_at_path(&mut Value, &str). Replace thefrom_valuebranch indeserialize_partial_configwith the retry loop. Keep the empty fallback as the last-ditch path. Tests: type mismatch on optional field, semantic error (unknown enum variant), nested struct field, array element. Reviewable and mergeable independently — already a strict improvement over the current behavior.Split
ConversationStream::from_parts. Introducefrom_parts_partial(returns the recoveredPartialAppConfigand the events) andfinalize_with_fallback(self, fallback: &PartialAppConfig). Existingfrom_partsbecomes a thin convenience wrapper that calls both with an empty fallback (preserving current behavior). Reviewable and mergeable independently.Workspace fallback plumbing. Add
fallback_partial: Option<Arc<PartialAppConfig>>toWorkspacewith a setter.jp_cli::run_innersets it frompipeline.partial_without_conversation()after pipeline construction.Workspace::events()andWorkspace::metadata()finalize viafinalize_with_fallbackinstead of the empty path. Depends on (2).Diagnostic warning improvements. Update
maybe_init_eventsandmaybe_init_conversationto log the underlying load error (%error). Independent of the rest; can ship in any order.Integration test. Exercise the full path: stored
base_config.jsonwith a type-mismatched required field, workspace partial supplies the replacement, conversation loads successfully, next persist writes the merged config back. Depends on (1)–(3).