RFD D45: Elelem: a standalone LLM provider streaming crate

Status: Draft
Category: Design
Authors: Jean Mertz git@jeanmertz.com
Date: 2026-05-30
Extends: RFD 012

Summary

elelem is a standalone, feature-gated crate that owns the LLM streaming pipeline: one SSE driver, one normalized event model, and a per-shape parser seam, with each provider behind a Cargo feature. It replaces the external provider SDKs JP currently wraps and in several cases forks.

Motivation

A recent bug, in which the cerebras and llamacpp SSE adapters silently swallowed stream errors via take_while(is_ok), exposed a structural problem: JP has one streaming contract with several divergent implementations, and the contract is written down nowhere that a compiler or test enforces it. Each provider wraps a different external crate, and those crates do two things: serde types and SSE stream processing. Several of them JP forks and maintains.

This conflates two concerns. Generic streaming plumbing (connect, drive the SSE stream, surface errors, emit a normalized event stream with consistent flush and finish semantics) should be DRY, owned, and tested once. Provider-specific logic (request building, chunk parsing, quirks) is irreducible and stays per provider. Today the generic part is copied and re-derived per provider, which is how one copy drifted into a silent-error bug.

Doing nothing means continued divergence, more latent bugs of this class, and ongoing fork maintenance across several crates.

Design

What a consumer sees

One crate, one dependency, providers behind features:

toml

elelem = { version = "0", features = ["anthropic"] }

The generic core is always available: the Event and StreamError types and the ChunkParser trait. Each provider feature adds that shape's typed wire request and response types and its parser. The SSE driver and the reqwest client builder sit behind the transport feature, on by default, so a consumer that only wants the wire types and parsers (to validate or transform chunks, or drive its own HTTP stack) can depend on elelem with default-features = false and pull in no HTTP dependencies. Provider features are the only public Cargo feature switches; the shared shape module is enabled internally via #[cfg(any(feature = "cerebras", …))], not separately feature-gated:

toml

[features]
default = ["transport"]
transport = ["dep:reqwest", "dep:reqwest-eventsource", "dep:tokio"]
cerebras   = []
llamacpp   = []
ollama     = []
openrouter = []
anthropic  = []
# ...

What elelem owns, and what JP keeps

elelem owns the wire: the typed request and response types for each API shape, the shape parsers, the SSE driver, and the Event and StreamError types. It does not own request building. JP populates elelem's exported request types from a ChatQuery, including every provider quirk (cache control, thinking budgets, reasoning effort, schema transforms, Ollama's forced-tool system message), and hands that to elelem as data: the typed body, the base URL, and auth headers. elelem alone builds the reqwest client and the EventSource; no public API accepts a prebuilt client or stream, which is what makes the connect-timeout and Never guarantees unforgeable. Request construction has never been a source of streaming bugs, and a provider-neutral input model expressive enough for every quirk would cost far more than it saves, so it stays in JP.

This makes elelem a typed wire client plus a streaming engine, a lower tier than a high-level chat() abstraction. The reusable, bug-prone part (driving and normalizing the stream) is shared; the request ergonomics are not.

The normalized event model is the contract

Event and StreamError move out of jp_llm and become elelem's public API; jp_llm re-exports them so JP keeps one event type, not a parallel copy. The model is small:

rust

enum Event {
    // a typed delta: message, reasoning, structured, or tool-call chunk
    Part { index: usize, part: EventPart, metadata: Map },
    // commit the parts grouped under `index`
    Flush { index: usize, metadata: Map },
    // emitted exactly once
    Finished(FinishReason),
}

index is an opaque grouping key, not a fixed slot: parts sharing an index accumulate until their Flush, and parsers emit flushes in stream order. Some shapes number them 0/1/2+ (chat completions), others use provider-native indices (Anthropic content blocks, OpenAI output_index, Google virtual indices). Callers must not attach meaning to the number. This matches RFD 012, which defined the index as a grouping key, not a semantic slot.

One SSE driver

A single driver owns the stream lifecycle, so it cannot diverge per provider:

It is built only through the shared client builder, the one place that sets the connect timeout and EventSource::set_retry_policy(Never). A new provider cannot forget either, which is the exact failure mode behind the original bug.
It enforces a stream-idle timeout from a value the caller passes in (JP supplies assistant.request.stream_idle_timeout_secs, where 0 disables). elelem owns the timeout mechanism and emits a retryable StreamError when it fires; JP owns the value.
It surfaces transport errors before completion as a retryable StreamError, converts a stream that ends without a terminal Finished into a retryable error, and drops the benign close that follows that Finished. The terminal signal is the shape's own ([DONE] for chat completions, a named event elsewhere); the driver keys off Finished, not any protocol literal. These rules are the regression guard the whole crate exists for.
Retry budget, backoff, and user notification stay in JP. elelem never retries on its own, and the contract suite asserts no shape or driver does.

The parser seam and the four shapes

A shape implements the parser, not a provider. The driver feeds it lifecycle frames and the parser emits normalized events; provider-specific parsers exist only when a wire shape is genuinely unique.

rust

enum Frame<'a> {
    Open,
    Message { event_name: Option<&'a str>, data: &'a str },
    Eof,
}

trait ChunkParser {
    fn parse(&mut self, frame: Frame<'_>) -> Vec<Result<Event, StreamError>>;
}

The Frame makes the lifecycle explicit, which is the point: the parser flushes any trailing state on Eof, and the driver, tracking whether a terminal Finished was emitted, suppresses the benign post-[DONE] close and converts a premature Eof into a retryable error. event_name is carried because Anthropic sends named SSE events (event: content_block_delta) while OpenAI-style streams are anonymous data: frames; reqwest_eventsource already exposes the name.

A provider selects a shape and supplies its request construction and quirks. The four chat-completions providers share one shape parser:

Shape	Providers
Chat Completions (OpenAI-style)	cerebras, llamacpp, ollama, openrouter
Responses (OpenAI)	openai
Messages (Anthropic)	anthropic
Gemini	google

Recovery: request rejection without a side channel

Some providers reject an otherwise-valid request because of stale metadata: Anthropic and Google invalidate old thinking and thought signatures. The fix is to strip the offending metadata from the conversation history and retry. That is recovery, not response content, so it rides the error channel, not the event stream. Event has no Patch variant and FinishReason has no Retry; both are deleted.

rust

enum StreamError {
    Timeout, Connect, RateLimit, Transient, /* ... */
    Recoverable { patches: Vec<Patch> },
}

struct Patch  { matcher: Match, action: Action }   // generic, content-addressed
enum   Match  { MetadataValue { key: String, value: String } }
enum   Action { RemoveMetadata(String) }

The shape builds the patch from the wire request and the rejection error (it owns the metadata-key vocabulary it emitted on the success path) and surfaces it as StreamError::Recoverable. Because that needs the request and error, not stream frames, it is a separate function on the shape from the frame-oriented ChunkParser; the shape owns both seams. JP applies it: find the persisted event whose metadata[key] == value, remove the key, rebuild the request, and retry. The match is by value, so JP needs no provider knowledge and stays a generic applier. Event::Patch was a transport hack that routed a history mutation through the content channel; moving it to the error channel removes both the conflation and any second Event type.

Recoverable is caller-action-required, not a transient transport failure, so JP applies the patches and retries immediately, without backoff and without spending the transient-error retry budget. It carries the underlying provider error: a caller that will not patch, or where no stored metadata matches, surfaces that error as an ordinary failure rather than looping. Each applied patch makes progress and an unmatched patch terminates, so recovery is bounded.

Transport is uniform SSE

Every provider streams SSE through reqwest_eventsource. Two decisions make that hold:

Ollama uses /v1/chat/completions, folding it into the chat shape with no new parser. That endpoint supports streaming, tools, vision, structured output, and reasoning via reasoning_content and reasoning_effort. Ollama's /v1/responses endpoint is non-stateful only and drops vision and structured output while adding nothing JP uses, so it is rejected. The chat endpoint has no tool_choice; JP keeps its existing forced-tool system-message workaround when it builds the request. See Ollama OpenAI compatibility and Ollama Anthropic compatibility.
Gemini uses ?alt=sse.

JP side

JP's Provider trait is unchanged from JP CLI's point of view: it still returns a stream of Event (now elelem's, re-exported). Below the trait, each provider implementation builds the wire request, calls elelem to drive and parse it, and owns everything that is not single-request wire handling: request construction and quirks, retry budget and backoff, the idle-timeout value (passed to elelem per request, so JP no longer wraps the stream with with_idle_timeout), recovery (applying Recoverable patches and retrying), the multi-request orchestration some providers need (Anthropic max-token chaining and forced-tool fallback, Google unexpected-tool-call retry), and EventBuilder, which stays in jp_llm because it translates EventPart into the persistence type ConversationEvent. elelem owns the OpenAI Responses wire types for both streaming and non-streaming responses but only the streaming transport, so the non-streaming fallback for streaming_unsupported models is JP's HTTP call mapping elelem's response type into events.

Drawbacks

Re-inlining forked crates risks re-discovering edge cases they quietly encode. Chesterton's Fence applies per crate; the vendor-first step and the contract suite are the mitigations.
elelem's event model becomes a semver commitment once published. That is the price of "standalone, reusable."
The effort is wide and asymmetric: chat-completions is nearly free (already hand-rolled), but Responses, Messages, and Gemini carry real request-schema work.
A standalone multi-provider client enters a populated space (genai, async-openai, and others). Maintaining a public crate is its own ongoing cost.

Alternatives

Extract a shared driver for cerebras/llamacpp only, keep the SDKs. Fixes the immediate duplication but leaves the fork-maintenance burden, does nothing for the SDK-wrapped providers, and gives the plugin future no foundation.
A core crate plus N shape crates plus N provider crates. More granular, but worse ergonomics for a reusable standalone library; consumers want one dependency and a feature, not a crate-assembly job.
One driver and one parser for all providers, dropping the SDKs entirely. A Golden Hammer: the wire shapes genuinely differ, and forcing Anthropic and Gemini through a chat-completions parser would trade correctness for uniformity.
Route everything through the Responses API. Responses earns its place for OpenAI proper (encrypted reasoning continuity, hosted tools, the o-series), but a chat-completions-compatible backend gains nothing from its Responses shim.

Non-Goals

Re-implementing reqwest_eventsource. The bug was JP's usage, not the crate; it stays.
Adding new providers.
Stateful Responses API support (JP does not use it for OpenAI either).

Risks and Open Questions

Quirk parity. The Anthropic and Google signature-strip-and-retry behaviors must survive the migration unchanged. The contract suite plus re-recorded cassettes are the guard.
Cassette re-recording needs live endpoints. llamacpp and ollama require local servers; the stale llamacpp cassettes surfaced by the original fix are the first to re-record.
Publication timing. Publishing elelem overrides the workspace publish = false and freezes its feature names, Event, StreamError, and per-shape request types as public API. Don't publish until at least two non-chat shapes are migrated, so the surface isn't frozen chat-biased.

Implementation Plan

Phase 0: Vendor. Pull the forked external crates into the workspace as plain members, keep every cassette green, and drop the upstream forks. Reviewable on its own, and independent of Phase 1, which extracts from the hand-rolled cerebras / llamacpp and needs no vendored SDK.
Phase 1: Core plus chat shape. Create crates/contrib/elelem. Extract the generic core (types, ChunkParser, driver, client builder) from cerebras / llamacpp, implement the chat-completions shape, and seed the stream-contract test suite (the surface/swallow cases already written). Move cerebras and llamacpp onto it.
Phase 2: Fold the chat family. Move ollama (switched to /v1/chat/completions) and openrouter onto the chat shape. Delete ollama-rs and jp_openrouter, including openrouter's internal backon retry loop, since JP owns retry.
Phase 3: Anthropic Messages. Add the shape, drop async_anthropic. First real exercise of the named-event seam and the StreamError::Recoverable path.
Phase 4: OpenAI Responses. Add the shape, drop openai_responses.
Phase 5: Gemini. Add the shape on alt=sse, drop gemini_client_rs.

Each phase is gated by the contract suite and re-recorded cassettes, and each provider cuts over independently, so the migration stays reviewable and reversible throughout.

References

RFD 012, which defined the typed Event / EventPart streaming model that elelem relocates out of jp_llm and owns.
The Ollama compatibility docs behind the transport decision.
RFD 043 (Discussion) stays JP-side: elelem emits raw tool-call chunks; EventBuilder owns their accumulation and any UI progress.
RFD 064 (Discussion) may later move patch application from in-place mutation to stored events applied at projection time.

RFD D45: Elelem: a standalone LLM provider streaming crate ​

Summary ​

Motivation ​

Design ​

What a consumer sees ​

What elelem owns, and what JP keeps ​

The normalized event model is the contract ​

One SSE driver ​

The parser seam and the four shapes ​

Recovery: request rejection without a side channel ​

Transport is uniform SSE ​

JP side ​

Drawbacks ​

Alternatives ​

Non-Goals ​

Risks and Open Questions ​

Implementation Plan ​

References ​

RFD D45: Elelem: a standalone LLM provider streaming crate

Summary

Motivation

Design

What a consumer sees

What elelem owns, and what JP keeps

The normalized event model is the contract

One SSE driver

The parser seam and the four shapes

Recovery: request rejection without a side channel

Transport is uniform SSE

JP side

Drawbacks

Alternatives

Non-Goals

Risks and Open Questions

Implementation Plan

References