RFD D45: Elelem: a standalone LLM provider streaming crate
- Status: Draft
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2026-05-30
- Extends: RFD 012
Summary
elelem is a standalone, feature-gated crate that owns the LLM streaming pipeline: one SSE driver, one normalized event model, and a per-shape parser seam, with each provider behind a Cargo feature. It replaces the external provider SDKs JP currently wraps and in several cases forks.
Motivation
A recent bug, in which the cerebras and llamacpp SSE adapters silently swallowed stream errors via take_while(is_ok), exposed a structural problem: JP has one streaming contract with several divergent implementations, and the contract is written down nowhere that a compiler or test enforces it. Each provider wraps a different external crate, and those crates do two things: serde types and SSE stream processing. Several of them JP forks and maintains.
This conflates two concerns. Generic streaming plumbing (connect, drive the SSE stream, surface errors, emit a normalized event stream with consistent flush and finish semantics) should be DRY, owned, and tested once. Provider-specific logic (request building, chunk parsing, quirks) is irreducible and stays per provider. Today the generic part is copied and re-derived per provider, which is how one copy drifted into a silent-error bug.
Doing nothing means continued divergence, more latent bugs of this class, and ongoing fork maintenance across several crates.
Design
What a consumer sees
One crate, one dependency, providers behind features:
elelem = { version = "0", features = ["anthropic"] }The generic core is always available: the Event and StreamError types and the ChunkParser trait. Each provider feature adds that shape's typed wire request and response types and its parser. The SSE driver and the reqwest client builder sit behind the transport feature, on by default, so a consumer that only wants the wire types and parsers (to validate or transform chunks, or drive its own HTTP stack) can depend on elelem with default-features = false and pull in no HTTP dependencies. Provider features are the only public Cargo feature switches; the shared shape module is enabled internally via #[cfg(any(feature = "cerebras", …))], not separately feature-gated:
[features]
default = ["transport"]
transport = ["dep:reqwest", "dep:reqwest-eventsource", "dep:tokio"]
cerebras = []
llamacpp = []
ollama = []
openrouter = []
anthropic = []
# ...What elelem owns, and what JP keeps
elelem owns the wire: the typed request and response types for each API shape, the shape parsers, the SSE driver, and the Event and StreamError types. It does not own request building. JP populates elelem's exported request types from a ChatQuery, including every provider quirk (cache control, thinking budgets, reasoning effort, schema transforms, Ollama's forced-tool system message), and hands that to elelem as data: the typed body, the base URL, and auth headers. elelem alone builds the reqwest client and the EventSource; no public API accepts a prebuilt client or stream, which is what makes the connect-timeout and Never guarantees unforgeable. Request construction has never been a source of streaming bugs, and a provider-neutral input model expressive enough for every quirk would cost far more than it saves, so it stays in JP.
This makes elelem a typed wire client plus a streaming engine, a lower tier than a high-level chat() abstraction. The reusable, bug-prone part (driving and normalizing the stream) is shared; the request ergonomics are not.
The normalized event model is the contract
Event and StreamError move out of jp_llm and become elelem's public API; jp_llm re-exports them so JP keeps one event type, not a parallel copy. The model is small:
enum Event {
// a typed delta: message, reasoning, structured, or tool-call chunk
Part { index: usize, part: EventPart, metadata: Map },
// commit the parts grouped under `index`
Flush { index: usize, metadata: Map },
// emitted exactly once
Finished(FinishReason),
}index is an opaque grouping key, not a fixed slot: parts sharing an index accumulate until their Flush, and parsers emit flushes in stream order. Some shapes number them 0/1/2+ (chat completions), others use provider-native indices (Anthropic content blocks, OpenAI output_index, Google virtual indices). Callers must not attach meaning to the number. This matches RFD 012, which defined the index as a grouping key, not a semantic slot.
One SSE driver
A single driver owns the stream lifecycle, so it cannot diverge per provider:
- It is built only through the shared client builder, the one place that sets the connect timeout and
EventSource::set_retry_policy(Never). A new provider cannot forget either, which is the exact failure mode behind the original bug. - It enforces a stream-idle timeout from a value the caller passes in (JP supplies
assistant.request.stream_idle_timeout_secs, where0disables). elelem owns the timeout mechanism and emits a retryableStreamErrorwhen it fires; JP owns the value. - It surfaces transport errors before completion as a retryable
StreamError, converts a stream that ends without a terminalFinishedinto a retryable error, and drops the benign close that follows thatFinished. The terminal signal is the shape's own ([DONE]for chat completions, a named event elsewhere); the driver keys offFinished, not any protocol literal. These rules are the regression guard the whole crate exists for. - Retry budget, backoff, and user notification stay in JP. elelem never retries on its own, and the contract suite asserts no shape or driver does.
The parser seam and the four shapes
A shape implements the parser, not a provider. The driver feeds it lifecycle frames and the parser emits normalized events; provider-specific parsers exist only when a wire shape is genuinely unique.
enum Frame<'a> {
Open,
Message { event_name: Option<&'a str>, data: &'a str },
Eof,
}
trait ChunkParser {
fn parse(&mut self, frame: Frame<'_>) -> Vec<Result<Event, StreamError>>;
}The Frame makes the lifecycle explicit, which is the point: the parser flushes any trailing state on Eof, and the driver, tracking whether a terminal Finished was emitted, suppresses the benign post-[DONE] close and converts a premature Eof into a retryable error. event_name is carried because Anthropic sends named SSE events (event: content_block_delta) while OpenAI-style streams are anonymous data: frames; reqwest_eventsource already exposes the name.
A provider selects a shape and supplies its request construction and quirks. The four chat-completions providers share one shape parser:
| Shape | Providers |
|---|---|
| Chat Completions (OpenAI-style) | cerebras, llamacpp, ollama, openrouter |
| Responses (OpenAI) | openai |
| Messages (Anthropic) | anthropic |
| Gemini |
Recovery: request rejection without a side channel
Some providers reject an otherwise-valid request because of stale metadata: Anthropic and Google invalidate old thinking and thought signatures. The fix is to strip the offending metadata from the conversation history and retry. That is recovery, not response content, so it rides the error channel, not the event stream. Event has no Patch variant and FinishReason has no Retry; both are deleted.
enum StreamError {
Timeout, Connect, RateLimit, Transient, /* ... */
Recoverable { patches: Vec<Patch> },
}
struct Patch { matcher: Match, action: Action } // generic, content-addressed
enum Match { MetadataValue { key: String, value: String } }
enum Action { RemoveMetadata(String) }The shape builds the patch from the wire request and the rejection error (it owns the metadata-key vocabulary it emitted on the success path) and surfaces it as StreamError::Recoverable. Because that needs the request and error, not stream frames, it is a separate function on the shape from the frame-oriented ChunkParser; the shape owns both seams. JP applies it: find the persisted event whose metadata[key] == value, remove the key, rebuild the request, and retry. The match is by value, so JP needs no provider knowledge and stays a generic applier. Event::Patch was a transport hack that routed a history mutation through the content channel; moving it to the error channel removes both the conflation and any second Event type.
Recoverable is caller-action-required, not a transient transport failure, so JP applies the patches and retries immediately, without backoff and without spending the transient-error retry budget. It carries the underlying provider error: a caller that will not patch, or where no stored metadata matches, surfaces that error as an ordinary failure rather than looping. Each applied patch makes progress and an unmatched patch terminates, so recovery is bounded.
Transport is uniform SSE
Every provider streams SSE through reqwest_eventsource. Two decisions make that hold:
- Ollama uses
/v1/chat/completions, folding it into the chat shape with no new parser. That endpoint supports streaming, tools, vision, structured output, and reasoning viareasoning_contentandreasoning_effort. Ollama's/v1/responsesendpoint is non-stateful only and drops vision and structured output while adding nothing JP uses, so it is rejected. The chat endpoint has notool_choice; JP keeps its existing forced-tool system-message workaround when it builds the request. See Ollama OpenAI compatibility and Ollama Anthropic compatibility. - Gemini uses
?alt=sse.
JP side
JP's Provider trait is unchanged from JP CLI's point of view: it still returns a stream of Event (now elelem's, re-exported). Below the trait, each provider implementation builds the wire request, calls elelem to drive and parse it, and owns everything that is not single-request wire handling: request construction and quirks, retry budget and backoff, the idle-timeout value (passed to elelem per request, so JP no longer wraps the stream with with_idle_timeout), recovery (applying Recoverable patches and retrying), the multi-request orchestration some providers need (Anthropic max-token chaining and forced-tool fallback, Google unexpected-tool-call retry), and EventBuilder, which stays in jp_llm because it translates EventPart into the persistence type ConversationEvent. elelem owns the OpenAI Responses wire types for both streaming and non-streaming responses but only the streaming transport, so the non-streaming fallback for streaming_unsupported models is JP's HTTP call mapping elelem's response type into events.
Drawbacks
- Re-inlining forked crates risks re-discovering edge cases they quietly encode. Chesterton's Fence applies per crate; the vendor-first step and the contract suite are the mitigations.
elelem's event model becomes a semver commitment once published. That is the price of "standalone, reusable."- The effort is wide and asymmetric: chat-completions is nearly free (already hand-rolled), but Responses, Messages, and Gemini carry real request-schema work.
- A standalone multi-provider client enters a populated space (genai, async-openai, and others). Maintaining a public crate is its own ongoing cost.
Alternatives
- Extract a shared driver for cerebras/llamacpp only, keep the SDKs. Fixes the immediate duplication but leaves the fork-maintenance burden, does nothing for the SDK-wrapped providers, and gives the plugin future no foundation.
- A core crate plus N shape crates plus N provider crates. More granular, but worse ergonomics for a reusable standalone library; consumers want one dependency and a feature, not a crate-assembly job.
- One driver and one parser for all providers, dropping the SDKs entirely. A Golden Hammer: the wire shapes genuinely differ, and forcing Anthropic and Gemini through a chat-completions parser would trade correctness for uniformity.
- Route everything through the Responses API. Responses earns its place for OpenAI proper (encrypted reasoning continuity, hosted tools, the o-series), but a chat-completions-compatible backend gains nothing from its Responses shim.
Non-Goals
- Re-implementing
reqwest_eventsource. The bug was JP's usage, not the crate; it stays. - Adding new providers.
- Stateful Responses API support (JP does not use it for OpenAI either).
Risks and Open Questions
- Quirk parity. The Anthropic and Google signature-strip-and-retry behaviors must survive the migration unchanged. The contract suite plus re-recorded cassettes are the guard.
- Cassette re-recording needs live endpoints.
llamacppandollamarequire local servers; the stalellamacppcassettes surfaced by the original fix are the first to re-record. - Publication timing. Publishing elelem overrides the workspace
publish = falseand freezes its feature names,Event,StreamError, and per-shape request types as public API. Don't publish until at least two non-chat shapes are migrated, so the surface isn't frozen chat-biased.
Implementation Plan
- Phase 0: Vendor. Pull the forked external crates into the workspace as plain members, keep every cassette green, and drop the upstream forks. Reviewable on its own, and independent of Phase 1, which extracts from the hand-rolled
cerebras/llamacppand needs no vendored SDK. - Phase 1: Core plus chat shape. Create
crates/contrib/elelem. Extract the generic core (types,ChunkParser, driver, client builder) fromcerebras/llamacpp, implement the chat-completions shape, and seed the stream-contract test suite (the surface/swallow cases already written). Movecerebrasandllamacpponto it. - Phase 2: Fold the chat family. Move
ollama(switched to/v1/chat/completions) andopenrouteronto the chat shape. Deleteollama-rsandjp_openrouter, includingopenrouter's internalbackonretry loop, since JP owns retry. - Phase 3: Anthropic Messages. Add the shape, drop
async_anthropic. First real exercise of the named-event seam and theStreamError::Recoverablepath. - Phase 4: OpenAI Responses. Add the shape, drop
openai_responses. - Phase 5: Gemini. Add the shape on
alt=sse, dropgemini_client_rs.
Each phase is gated by the contract suite and re-recorded cassettes, and each provider cuts over independently, so the migration stays reviewable and reversible throughout.
References
- RFD 012, which defined the typed
Event/EventPartstreaming model thatelelemrelocates out ofjp_llmand owns. - The Ollama compatibility docs behind the transport decision.
- RFD 043 (Discussion) stays JP-side: elelem emits raw tool-call chunks;
EventBuilderowns their accumulation and any UI progress. - RFD 064 (Discussion) may later move patch application from in-place mutation to stored events applied at projection time.