RFD 032: Grizzly Semantic Search

Status: Accepted
Category: Design
Authors: Jean Mertz git@jeanmertz.com
Date: 2026-03-07

Summary

Add semantic vector search, FTS5 full-text search, and typo tolerance to grizzly's note_search tool. The goal is to find notes by meaning, not just literal text. Searching "productivity systems" should surface notes about GTD and Pomodoro even when those exact words don't appear.

Context

Grizzly is a Bear.app MCP server that lives in crates/contrib/grizzly/. It exposes note_get, note_search, and note_create tools over the MCP protocol. The current search implementation uses SQL LIKE '%query%' with hand-rolled relevance scoring. It works but has three weaknesses:

Substring matching — "prod" matches "reproduce", no word boundary awareness.
No typo handling — "productivty" finds nothing.
No semantic understanding — searching "task management" won't find a note titled "GTD methodology" unless those words appear literally.

This RFD covers three layers of improvement, each independent and additive.

Architecture Overview

txt

                                  ┌─────────────────────┐
                                  │   note_search tool  │
                                  └──────────┬──────────┘
                                             │
                              ┌──────────────┼──────────────┐
                              ▼              ▼              ▼
                         ┌────────┐    ┌──────────┐   ┌──────────┐
                         │  LIKE  │    │   FTS5   │   │ Semantic │
                         │ search │    │  search  │   │  search  │
                         └────────┘    └──────────┘   └──────────┘
                              │              │              │
                              └──────────────┼──────────────┘
                                             ▼
                                    ┌────────────────┐
                                    │  Score Fusion  │
                                    │    (RRF)       │
                                    └────────────────┘

The search function dispatches to one or more backends depending on what's available, then merges results using Reciprocal Rank Fusion.

Existing Code You Need to Know

File	What it does
`src/db.rs`	`BearDb` — opens Bear's read-only SQLite
	database, discovers schema at init,
	hands out short-lived connections. All
	queries go through
	`with_connection(fn(conn, cte))`.
`src/schema.rs`	Discovers Bear's Core Data junction
	table names (`Z_5TAGS`, etc.) at
	runtime, generates a normalizing CTE
	that maps raw tables to clean `notes`,
	`tags`, `note_tags` views.
`src/search.rs`	`SearchParams` and `execute()` — the
	current LIKE-based search with SQL-level
	scoring and Rust-side line extraction.
	This is the file you'll modify most.
`src/server.rs`	MCP tool handlers. `note_search` calls
	`db.search(&params)` and formats results
	as XML.
`src/main.rs`	CLI entry point. `--jp` flag for JP tool
	protocol, `--note-create` for write
	access. You'll add new flags here.
`src/error.rs`	Error enum. You'll add new variants.
`Cargo.toml`	Dependencies are gated behind the
	workspace. New deps go here with feature
	flags.

Bear's database is read-only to us (we never write to it). The sidecar database for embeddings is ours and we can write to it freely.

Layer 1: FTS5 Full-Text Search

What

Replace LIKE '%query%' with SQLite FTS5 for word-aware search with BM25 ranking.

Why

FTS5 gives us tokenized word matching (no false substring hits), BM25 relevance ranking, prefix queries (product*), phrase queries ("getting things done"), and boolean operators (productivity OR gtd).

How

Our rusqlite dependency already compiles SQLite with FTS5 enabled (the bundled feature includes it). No new dependencies needed.

On each search call, create a temporary in-memory FTS5 table, populate it from the notes CTE, query it, then let it drop when the connection closes. Since we use short-lived connections, this happens every search.

sql

CREATE VIRTUAL TABLE temp.fts_notes USING fts5(
    id UNINDEXED,
    title,
    content,
    tokenize='unicode61'
);

INSERT INTO temp.fts_notes(id, title, content)
SELECT id, title, content FROM notes WHERE is_trashed = 0;

-- Search with BM25 ranking
SELECT id, title, content, rank
FROM temp.fts_notes
WHERE fts_notes MATCH ?
ORDER BY rank
LIMIT ?;

File changes:

New file src/fts.rs — functions to create the temp FTS table, run FTS5 queries, and map results back to ScoredNote.
Modify src/search.rs — execute() tries FTS5 first. If the FTS5 query fails (malformed syntax, etc.), fall back to LIKE search. Add a mode field to SearchParams:

rust

pub enum SearchMode {
    /// Try FTS5, fall back to LIKE on error.
    Auto,
    /// Force FTS5 (fail if unavailable).
    Fts,
    /// Force LIKE (current behavior).
    Like,
}

Modify src/error.rs — add FtsError variant.

Performance concern: For a few hundred notes, building the FTS table per-search is <50ms. For thousands, it could be slow. Measure this. If it's a problem, consider caching the FTS table in a persistent temp file that gets rebuilt when Bear's database modification time changes.

Testing: Add tests that exercise FTS5 query syntax (prefix, phrase, boolean). Test the LIKE fallback by passing malformed FTS5 queries.

Typo Tolerance via Trigram Tokenizer

FTS5 supports a trigram tokenizer that indexes character trigrams instead of words. This naturally handles typos because "productivty" and "productivity" share most trigrams.

Add a second FTS5 table using the trigram tokenizer:

sql

CREATE VIRTUAL TABLE temp.fts_trigram USING fts5(
    id UNINDEXED,
    title,
    content,
    tokenize='trigram'
);

When the unicode61 FTS5 search returns few or no results, retry against the trigram table. The trigram table has worse ranking quality (many false positives) so it's a fallback, not the primary.

This is a one-line tokenizer change with no new dependencies. The tradeoff is a larger in-memory index and noisier results.

Layer 2: Semantic Vector Search

What

Embed notes and queries as vectors using a local ML model, then find semantically similar notes by cosine distance.

Dependencies

Two new dependencies, both behind a semantic cargo feature flag:

fastembed (Apache 2.0) — Rust library for generating text embeddings locally using ONNX Runtime. Handles model download, tokenization, and inference. We use it to convert text into dense float vectors. Crate: https://crates.io/crates/fastembed
sqlite-vector (Elastic License 2.0 with OSI open-source exception) — C-based SQLite extension for vector similarity search. SIMD-accelerated distance functions, quantization for memory efficiency, no preindexing required. We compile it from source via build.rs. Source: https://github.com/sqliteai/sqlite-vector

The license situation: sqlite-vector is free for OSI-licensed open-source projects (grizzly is MIT). If grizzly is ever used in a closed-source product, a commercial license would be needed. fastembed is Apache 2.0, no restrictions.

toml

# In Cargo.toml
[features]
default = []
semantic = ["dep:fastembed"]

[dependencies]
fastembed = { version = "5", optional = true, default-features = false, features = [
    "ort-download-binaries-native-tls",
    "hf-hub-native-tls",
] }

[build-dependencies]
cc = "1" # for compiling sqlite-vector C source

For sqlite-vector, vendor the C source files into crates/contrib/grizzly/vendor/sqlite-vector/ and compile them in build.rs using the cc crate. Then register the extension at runtime via rusqlite's load_extension_enable() + load_extension(), or (better) use Connection::load_extension() with the compiled static library. The exact integration path needs prototyping — rusqlite's load_extension wants a .dylib/.so path, so the static compilation approach may require using rusqlite's create_module() API or calling sqlite3_vector_init directly via FFI.

Sidecar Database

Embeddings are stored in a separate SQLite embeddings.db database. This database is owned by grizzly and is read-write. The path is set using a CLI flag or environment variable. Ideally the caller uses something like directories::ProjectDirs::cache_dir() to get the cache directory.

Schema:

sql

CREATE TABLE IF NOT EXISTS meta (
    key TEXT PRIMARY KEY,
    value TEXT
);
-- Stores: model_name, dimension, last_full_index_at

CREATE TABLE IF NOT EXISTS embeddings (
    note_id TEXT PRIMARY KEY,
    embedding BLOB NOT NULL,      -- float32 vector, dimension from meta
    updated_at TEXT NOT NULL      -- ISO 8601, from Bear's ZMODIFICATIONDATE
);

The embedding column stores raw float32 bytes (via fastembed's output). sqlite-vector operates directly on these BLOBs.

Indexing

Embeddings are built via a CLI subcommand:

grizzly index [--model MODEL_NAME]

This:

Opens Bear's database (read-only) and the sidecar (read-write).
Loads the embedding model (BAAI/bge-small-en-v1.5 by default, 512 dimensions, ~127MB download on first run).
Queries all non-trashed notes from Bear.
For each note whose updated_at has changed (or is missing from the sidecar), generates an embedding from "{title}\n{content}".
Writes embeddings to the sidecar in batches.
Calls vector_init and vector_quantize on the sidecar for fast search.

Incremental: only re-embeds notes whose modification timestamp changed. Full rebuild via grizzly index --rebuild.

File changes:

New file src/embedding.rs — sidecar DB management (open/create, schema migration), embedding generation via fastembed, incremental update logic.
New file src/semantic.rs — vector search against the sidecar, returns Vec<(note_id, distance)>.
Modify src/main.rs — add index subcommand with --model and --rebuild flags.
Modify src/error.rs — add EmbeddingError, SidecarError variants.

Search Flow

When semantic search is enabled (embeddings exist in the sidecar):

Embed the query string using the same model.
Query the sidecar: SELECT note_id, distance FROM vector_quantize_scan('embeddings', 'embedding', ?, 50).
Run FTS5/LIKE search in parallel (same query string).
Merge results using Reciprocal Rank Fusion (RRF).

RRF formula: for each note appearing in any result list, score = sum(1 / (k + rank_i)) where k = 60 (standard constant) and rank_i is the note's position in result list i. Notes are then sorted by fused score descending.

rust

fn reciprocal_rank_fusion(
    results: &[Vec<String>],  // each inner vec is note_ids in ranked order
    k: f64,                   // typically 60.0
) -> Vec<(String, f64)> {
    let mut scores: HashMap<String, f64> = HashMap::new();
    for list in results {
        for (rank, note_id) in list.iter().enumerate() {
            *scores.entry(note_id.clone()).or_default() += 1.0 / (k + rank as f64 + 1.0);
        }
    }
    let mut ranked: Vec<_> = scores.into_iter().collect();
    ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
    ranked
}

File changes:

New file src/fusion.rs — the RRF implementation.
Modify src/search.rs — execute() checks if sidecar exists and has embeddings, dispatches to semantic + text search, merges via RRF.

Graceful Degradation

The search pipeline degrades gracefully:

Condition	Behavior
Sidecar exists, FTS5 works	Semantic + FTS5, merged with RRF
Sidecar exists, FTS5 fails	Semantic + LIKE, merged with RRF
No sidecar, FTS5 works	FTS5 only
No sidecar, FTS5 fails	LIKE only (current behavior)

The user never sees an error from missing embeddings. They just get less relevant results.

CLI Flags

grizzly [--jp] [--note-create] [--no-semantic]
grizzly index [--model MODEL] [--rebuild]

--no-semantic — disable semantic search even if embeddings exist (useful for debugging).
index subcommand — build or update the embedding index.
--model — override the embedding model (default: BAAI/bge-small-en-v1.5).
--rebuild — force full re-embedding of all notes.

Model Choice

Default: BAAI/bge-small-en-v1.5 (fastembed's own default)

512 dimensions, 33M parameters, ~127MB download
Cached in ~/.cache/huggingface/ after first download
Fast inference (~1ms per short text on M-series Mac)
Ranks significantly higher than all-MiniLM-L6-v2 on MTEB benchmarks (rank 94 vs 123, mean score 43.76 vs 41.39) at the same memory tier

Other reasonable choices if users want to trade speed/size for quality:

Model	Memory	Dims	MTEB Rank
`BAAI/bge-base-en-v1.5`	390MB	768	82
`nomic-ai/nomic-embed-text-v1.5`	522MB	768	87
`mixedbread-ai/mxbai-embed-large-v1`	639MB	1024	61

Users can switch via --model, but changing models requires a full rebuild since embeddings from different models aren't compatible.

Store the model name in the sidecar's meta table. On search, verify the model matches. If it doesn't, log a warning and skip semantic search (don't crash).

Implementation Order

FTS5 — no new dependencies, moderate complexity. Validate that the temp-table-per-search approach is fast enough.
Trigram typo tolerance — trivial addition on top of FTS5.
sqlite-vector integration — figure out the build.rs / FFI story. This is the riskiest part. Prototype it in isolation before integrating.
fastembed integration — straightforward API, but the ONNX Runtime binary is large. Verify it works behind a feature flag without bloating the default build.
Sidecar database — schema, migration, incremental indexing.
Search fusion — wire it all together with RRF.

Steps 1-2 can ship independently. Steps 3-6 ship together as the semantic feature.

Risks

sqlite-vector build integration: Compiling C source via build.rs and loading it into rusqlite's bundled SQLite may be tricky. The extension expects to be loaded via sqlite3_vector_init, which is normally called by load_extension(). With bundled SQLite, we may need to call the init function directly via FFI after getting the sqlite3* handle from rusqlite. This needs a prototype.
ONNX Runtime binary size: The ort crate (used by fastembed) downloads prebuilt ONNX Runtime binaries (~50MB). This only affects builds with the semantic feature, but it's a large addition. The download happens at build time, not runtime.
First-run model download latency: The first grizzly index call downloads the embedding model from Hugging Face (~127MB for the default bge-small-en-v1.5). Subsequent runs use the cache. This should be clearly communicated to the user (fastembed supports a download progress callback).
FTS5 temp table rebuild cost: If Bear has 5,000+ notes, rebuilding the FTS5 index on every search could be slow. Measure this early. Mitigation: cache the FTS table in a persistent temp file, keyed by the Bear database's file modification time.
License: sqlite-vector's Elastic License 2.0 with open-source exception is fine for grizzly (MIT), but worth tracking if the project's license situation changes.

Not In Scope

Multi-language embedding models (English-only for now; users can override with --model if needed)
Streaming index updates / watch mode (batch rebuild is fine for personal note collections)
Remote embedding APIs (everything stays local)
Image embeddings (Bear notes are text)
Graph-based note relationships (explored sqlitegraph, rejected due to GPL license and overkill)

RFD 032: Grizzly Semantic Search ​

Summary ​

Context ​

Architecture Overview ​

Existing Code You Need to Know ​

Layer 1: FTS5 Full-Text Search ​

What ​

Why ​

How ​

Typo Tolerance via Trigram Tokenizer ​

Layer 2: Semantic Vector Search ​

What ​

Dependencies ​

Sidecar Database ​

Indexing ​

Search Flow ​

Graceful Degradation ​

CLI Flags ​

Model Choice ​

Implementation Order ​

Risks ​

Not In Scope ​

RFD 032: Grizzly Semantic Search

Summary

Context

Architecture Overview

Existing Code You Need to Know

Layer 1: FTS5 Full-Text Search

What

Why

How

Typo Tolerance via Trigram Tokenizer

Layer 2: Semantic Vector Search

What

Dependencies

Sidecar Database

Indexing

Search Flow

Graceful Degradation

CLI Flags

Model Choice

Implementation Order

Risks

Not In Scope