RFD 032: Grizzly Semantic Search
- Status: Accepted
- Category: Design
- Authors: Jean Mertz git@jeanmertz.com
- Date: 2026-03-07
Summary
Add semantic vector search, FTS5 full-text search, and typo tolerance to grizzly's note_search tool. The goal is to find notes by meaning, not just literal text. Searching "productivity systems" should surface notes about GTD and Pomodoro even when those exact words don't appear.
Context
Grizzly is a Bear.app MCP server that lives in crates/contrib/grizzly/. It exposes note_get, note_search, and note_create tools over the MCP protocol. The current search implementation uses SQL LIKE '%query%' with hand-rolled relevance scoring. It works but has three weaknesses:
- Substring matching — "prod" matches "reproduce", no word boundary awareness.
- No typo handling — "productivty" finds nothing.
- No semantic understanding — searching "task management" won't find a note titled "GTD methodology" unless those words appear literally.
This RFD covers three layers of improvement, each independent and additive.
Architecture Overview
┌─────────────────────┐
│ note_search tool │
└──────────┬──────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐
│ LIKE │ │ FTS5 │ │ Semantic │
│ search │ │ search │ │ search │
└────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────┼──────────────┘
▼
┌────────────────┐
│ Score Fusion │
│ (RRF) │
└────────────────┘The search function dispatches to one or more backends depending on what's available, then merges results using Reciprocal Rank Fusion.
Existing Code You Need to Know
| File | What it does |
|---|---|
src/db.rs | BearDb — opens Bear's read-only SQLite |
| database, discovers schema at init, | |
| hands out short-lived connections. All | |
| queries go through | |
with_connection(fn(conn, cte)). | |
src/schema.rs | Discovers Bear's Core Data junction |
table names (Z_5TAGS, etc.) at | |
| runtime, generates a normalizing CTE | |
that maps raw tables to clean notes, | |
tags, note_tags views. | |
src/search.rs | SearchParams and execute() — the |
| current LIKE-based search with SQL-level | |
| scoring and Rust-side line extraction. | |
| This is the file you'll modify most. | |
src/server.rs | MCP tool handlers. note_search calls |
db.search(¶ms) and formats results | |
| as XML. | |
src/main.rs | CLI entry point. --jp flag for JP tool |
protocol, --note-create for write | |
| access. You'll add new flags here. | |
src/error.rs | Error enum. You'll add new variants. |
Cargo.toml | Dependencies are gated behind the |
| workspace. New deps go here with feature | |
| flags. |
Bear's database is read-only to us (we never write to it). The sidecar database for embeddings is ours and we can write to it freely.
Layer 1: FTS5 Full-Text Search
What
Replace LIKE '%query%' with SQLite FTS5 for word-aware search with BM25 ranking.
Why
FTS5 gives us tokenized word matching (no false substring hits), BM25 relevance ranking, prefix queries (product*), phrase queries ("getting things done"), and boolean operators (productivity OR gtd).
How
Our rusqlite dependency already compiles SQLite with FTS5 enabled (the bundled feature includes it). No new dependencies needed.
On each search call, create a temporary in-memory FTS5 table, populate it from the notes CTE, query it, then let it drop when the connection closes. Since we use short-lived connections, this happens every search.
CREATE VIRTUAL TABLE temp.fts_notes USING fts5(
id UNINDEXED,
title,
content,
tokenize='unicode61'
);
INSERT INTO temp.fts_notes(id, title, content)
SELECT id, title, content FROM notes WHERE is_trashed = 0;
-- Search with BM25 ranking
SELECT id, title, content, rank
FROM temp.fts_notes
WHERE fts_notes MATCH ?
ORDER BY rank
LIMIT ?;File changes:
- New file
src/fts.rs— functions to create the temp FTS table, run FTS5 queries, and map results back toScoredNote. - Modify
src/search.rs—execute()tries FTS5 first. If the FTS5 query fails (malformed syntax, etc.), fall back to LIKE search. Add amodefield toSearchParams:
pub enum SearchMode {
/// Try FTS5, fall back to LIKE on error.
Auto,
/// Force FTS5 (fail if unavailable).
Fts,
/// Force LIKE (current behavior).
Like,
}- Modify
src/error.rs— addFtsErrorvariant.
Performance concern: For a few hundred notes, building the FTS table per-search is <50ms. For thousands, it could be slow. Measure this. If it's a problem, consider caching the FTS table in a persistent temp file that gets rebuilt when Bear's database modification time changes.
Testing: Add tests that exercise FTS5 query syntax (prefix, phrase, boolean). Test the LIKE fallback by passing malformed FTS5 queries.
Typo Tolerance via Trigram Tokenizer
FTS5 supports a trigram tokenizer that indexes character trigrams instead of words. This naturally handles typos because "productivty" and "productivity" share most trigrams.
Add a second FTS5 table using the trigram tokenizer:
CREATE VIRTUAL TABLE temp.fts_trigram USING fts5(
id UNINDEXED,
title,
content,
tokenize='trigram'
);When the unicode61 FTS5 search returns few or no results, retry against the trigram table. The trigram table has worse ranking quality (many false positives) so it's a fallback, not the primary.
This is a one-line tokenizer change with no new dependencies. The tradeoff is a larger in-memory index and noisier results.
Layer 2: Semantic Vector Search
What
Embed notes and queries as vectors using a local ML model, then find semantically similar notes by cosine distance.
Dependencies
Two new dependencies, both behind a semantic cargo feature flag:
fastembed(Apache 2.0) — Rust library for generating text embeddings locally using ONNX Runtime. Handles model download, tokenization, and inference. We use it to convert text into dense float vectors. Crate: https://crates.io/crates/fastembedsqlite-vector(Elastic License 2.0 with OSI open-source exception) — C-based SQLite extension for vector similarity search. SIMD-accelerated distance functions, quantization for memory efficiency, no preindexing required. We compile it from source viabuild.rs. Source: https://github.com/sqliteai/sqlite-vector
The license situation: sqlite-vector is free for OSI-licensed open-source projects (grizzly is MIT). If grizzly is ever used in a closed-source product, a commercial license would be needed. fastembed is Apache 2.0, no restrictions.
# In Cargo.toml
[features]
default = []
semantic = ["dep:fastembed"]
[dependencies]
fastembed = { version = "5", optional = true, default-features = false, features = [
"ort-download-binaries-native-tls",
"hf-hub-native-tls",
] }
[build-dependencies]
cc = "1" # for compiling sqlite-vector C sourceFor sqlite-vector, vendor the C source files into crates/contrib/grizzly/vendor/sqlite-vector/ and compile them in build.rs using the cc crate. Then register the extension at runtime via rusqlite's load_extension_enable() + load_extension(), or (better) use Connection::load_extension() with the compiled static library. The exact integration path needs prototyping — rusqlite's load_extension wants a .dylib/.so path, so the static compilation approach may require using rusqlite's create_module() API or calling sqlite3_vector_init directly via FFI.
Sidecar Database
Embeddings are stored in a separate SQLite embeddings.db database. This database is owned by grizzly and is read-write. The path is set using a CLI flag or environment variable. Ideally the caller uses something like directories::ProjectDirs::cache_dir() to get the cache directory.
Schema:
CREATE TABLE IF NOT EXISTS meta (
key TEXT PRIMARY KEY,
value TEXT
);
-- Stores: model_name, dimension, last_full_index_at
CREATE TABLE IF NOT EXISTS embeddings (
note_id TEXT PRIMARY KEY,
embedding BLOB NOT NULL, -- float32 vector, dimension from meta
updated_at TEXT NOT NULL -- ISO 8601, from Bear's ZMODIFICATIONDATE
);The embedding column stores raw float32 bytes (via fastembed's output). sqlite-vector operates directly on these BLOBs.
Indexing
Embeddings are built via a CLI subcommand:
grizzly index [--model MODEL_NAME]This:
- Opens Bear's database (read-only) and the sidecar (read-write).
- Loads the embedding model (
BAAI/bge-small-en-v1.5by default, 512 dimensions, ~127MB download on first run). - Queries all non-trashed notes from Bear.
- For each note whose
updated_athas changed (or is missing from the sidecar), generates an embedding from"{title}\n{content}". - Writes embeddings to the sidecar in batches.
- Calls
vector_initandvector_quantizeon the sidecar for fast search.
Incremental: only re-embeds notes whose modification timestamp changed. Full rebuild via grizzly index --rebuild.
File changes:
- New file
src/embedding.rs— sidecar DB management (open/create, schema migration), embedding generation viafastembed, incremental update logic. - New file
src/semantic.rs— vector search against the sidecar, returnsVec<(note_id, distance)>. - Modify
src/main.rs— addindexsubcommand with--modeland--rebuildflags. - Modify
src/error.rs— addEmbeddingError,SidecarErrorvariants.
Search Flow
When semantic search is enabled (embeddings exist in the sidecar):
- Embed the query string using the same model.
- Query the sidecar:
SELECT note_id, distance FROM vector_quantize_scan('embeddings', 'embedding', ?, 50). - Run FTS5/LIKE search in parallel (same query string).
- Merge results using Reciprocal Rank Fusion (RRF).
RRF formula: for each note appearing in any result list, score = sum(1 / (k + rank_i)) where k = 60 (standard constant) and rank_i is the note's position in result list i. Notes are then sorted by fused score descending.
fn reciprocal_rank_fusion(
results: &[Vec<String>], // each inner vec is note_ids in ranked order
k: f64, // typically 60.0
) -> Vec<(String, f64)> {
let mut scores: HashMap<String, f64> = HashMap::new();
for list in results {
for (rank, note_id) in list.iter().enumerate() {
*scores.entry(note_id.clone()).or_default() += 1.0 / (k + rank as f64 + 1.0);
}
}
let mut ranked: Vec<_> = scores.into_iter().collect();
ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
ranked
}File changes:
- New file
src/fusion.rs— the RRF implementation. - Modify
src/search.rs—execute()checks if sidecar exists and has embeddings, dispatches to semantic + text search, merges via RRF.
Graceful Degradation
The search pipeline degrades gracefully:
| Condition | Behavior |
|---|---|
| Sidecar exists, FTS5 works | Semantic + FTS5, merged with RRF |
| Sidecar exists, FTS5 fails | Semantic + LIKE, merged with RRF |
| No sidecar, FTS5 works | FTS5 only |
| No sidecar, FTS5 fails | LIKE only (current behavior) |
The user never sees an error from missing embeddings. They just get less relevant results.
CLI Flags
grizzly [--jp] [--note-create] [--no-semantic]
grizzly index [--model MODEL] [--rebuild]--no-semantic— disable semantic search even if embeddings exist (useful for debugging).indexsubcommand — build or update the embedding index.--model— override the embedding model (default:BAAI/bge-small-en-v1.5).--rebuild— force full re-embedding of all notes.
Model Choice
Default: BAAI/bge-small-en-v1.5 (fastembed's own default)
- 512 dimensions, 33M parameters, ~127MB download
- Cached in
~/.cache/huggingface/after first download - Fast inference (~1ms per short text on M-series Mac)
- Ranks significantly higher than
all-MiniLM-L6-v2on MTEB benchmarks (rank 94 vs 123, mean score 43.76 vs 41.39) at the same memory tier
Other reasonable choices if users want to trade speed/size for quality:
| Model | Memory | Dims | MTEB Rank |
|---|---|---|---|
BAAI/bge-base-en-v1.5 | 390MB | 768 | 82 |
nomic-ai/nomic-embed-text-v1.5 | 522MB | 768 | 87 |
mixedbread-ai/mxbai-embed-large-v1 | 639MB | 1024 | 61 |
Users can switch via --model, but changing models requires a full rebuild since embeddings from different models aren't compatible.
Store the model name in the sidecar's meta table. On search, verify the model matches. If it doesn't, log a warning and skip semantic search (don't crash).
Implementation Order
- FTS5 — no new dependencies, moderate complexity. Validate that the temp-table-per-search approach is fast enough.
- Trigram typo tolerance — trivial addition on top of FTS5.
- sqlite-vector integration — figure out the build.rs / FFI story. This is the riskiest part. Prototype it in isolation before integrating.
- fastembed integration — straightforward API, but the ONNX Runtime binary is large. Verify it works behind a feature flag without bloating the default build.
- Sidecar database — schema, migration, incremental indexing.
- Search fusion — wire it all together with RRF.
Steps 1-2 can ship independently. Steps 3-6 ship together as the semantic feature.
Risks
sqlite-vector build integration: Compiling C source via
build.rsand loading it into rusqlite's bundled SQLite may be tricky. The extension expects to be loaded viasqlite3_vector_init, which is normally called byload_extension(). With bundled SQLite, we may need to call the init function directly via FFI after getting thesqlite3*handle from rusqlite. This needs a prototype.ONNX Runtime binary size: The
ortcrate (used byfastembed) downloads prebuilt ONNX Runtime binaries (~50MB). This only affects builds with thesemanticfeature, but it's a large addition. The download happens at build time, not runtime.First-run model download latency: The first
grizzly indexcall downloads the embedding model from Hugging Face (~127MB for the defaultbge-small-en-v1.5). Subsequent runs use the cache. This should be clearly communicated to the user (fastembed supports a download progress callback).FTS5 temp table rebuild cost: If Bear has 5,000+ notes, rebuilding the FTS5 index on every search could be slow. Measure this early. Mitigation: cache the FTS table in a persistent temp file, keyed by the Bear database's file modification time.
License:
sqlite-vector's Elastic License 2.0 with open-source exception is fine for grizzly (MIT), but worth tracking if the project's license situation changes.
Not In Scope
- Multi-language embedding models (English-only for now; users can override with
--modelif needed) - Streaming index updates / watch mode (batch rebuild is fine for personal note collections)
- Remote embedding APIs (everything stays local)
- Image embeddings (Bear notes are text)
- Graph-based note relationships (explored
sqlitegraph, rejected due to GPL license and overkill)