Hybrid retrieval
Each query combines semantic search and exact keyword matching, then fuses the rankings so the strongest results rise to the top.
Private code memory for AI
SourceVault keeps every line on infrastructure you control — laptop, workstation, or your own server — indexes it with code-aware retrieval, and gives your team a safer way to search, read, and reason about proprietary code.
What buyers get
See it work
The problem
Most code assistants want to upload your repository, keep it in a cloud context window, and answer from a lossy slice of your system. That creates privacy risk, compliance concerns, and weak answers when the repo is large or sensitive.
SourceVault is built for teams that need private code intelligence on local infrastructure.
How it works
The retrieval patterns the leading RAG frameworks document as best practice — implemented natively on ChromaDB and Ollama, so the stack stays small and every line is yours to audit.
Each query combines semantic search and exact keyword matching, then fuses the rankings so the strongest results rise to the top.
Files are split along function and class boundaries, so every result maps to readable code with exact line ranges.
Each chunk is embedded with its path and symbol names so the vector store knows where code lives, not just what it says.
Ask mode checks whether it has enough context, retrieves again if needed, and answers only from source-backed snippets instead of guessing.
Cost model
Watch an AI agent work on a real codebase and you will see where the money goes: it re-reads whole files to find one function, drags stale context forward turn after turn, and pays again for every retry. That burn compounds — per question, per developer, per day — and per-seat plans meter all of it.
SourceVault's retrieval engine exists to end that pattern. Every question gets a bounded budget of cited file and line ranges — never a repo dump — and repeated questions return from cache with zero model calls. Local models answer with no meter at all; when you do route an agent to a hosted model, it receives targeted snippets, so the tokens you pay for are the ones that matter.
Operator experience
Slash commands behave the same way every time in Hermes CLI, Telegram, and Hermes Desktop — easy to teach, support, and repeat. Just ask your question; the engine retrieves on its own.
The same engine is available over MCP to Claude Code, OpenClaw, and any MCP client, so every AI tool you run answers from the same private, cited index.
/code-search express "trust proxy client ip"/code-read express lib/request.js/code-ask express "how does the trust proxy setting affect the client IP?"Code intelligence
The retrieval engine goes past keyword matching: it knows how your code connects, keeps its answers honest as the code changes, and plugs into the AI tools you already use.
A symbol graph links definitions to their references, so "who calls this function?" is answered from the code's actual structure — not just text that looks similar.
If a cited file changed since it was indexed, the citation says so. The answer never quietly points you at lines that moved.
Only changed files re-embed, so keeping a large repository current is fast — and the nightly refresh stays cheap.
Rate answers and a built-in eval harness replays them after any model or index change, reporting drift — the backbone of ongoing retrieval tuning.
Ask one question across every indexed repository at once, with each citation tagged by repo — "how do the frontend and backend handle this?" in a single answer.
Commit messages index alongside code, so "why was this changed?" and "when did this break?" get grounded answers — coverage a cloud indexer never sees, because it never sees your history.
A generated repo map and per-module summaries answer "how does auth work overall" with an overview instead of fragments — written by your local model, from your code, on your machine.
A local reranker re-reads the top candidates against your actual question before answering. Adopted because the benchmark proved it: higher file-hit and citation precision at sub-second cost.
An MCP server exposes the same engine to Claude Code, OpenClaw, and any MCP client — bounded, cited context instead of re-reading files. The server itself is local-only: pair it with a local-model client and the loop stays zero-egress end to end; a cloud-backed client sends what it retrieves to its own vendor, by your choice.
The dashboard
Everything ships with a browser dashboard — connect your source control platforms, manage repositories and models, and ask questions about your code without touching a terminal. See it in action in the walkthrough above.
Sign in to GitHub, GitLab, or Bitbucket once. Browse and autocomplete your repositories as you type, and clone private repos without per-clone credentials.
Repositories index automatically on import. Update, sync, or switch branches per repo — and a stale index is one click from fresh.
Literal and semantic search with file-type filters, plus Ask mode for grounded answers where every citation clicks open to its source. History and archive are built in.
Citations and search results open the full file in a syntax-highlighted viewer — cited lines marked and scrolled into view, 15 languages, selectable light and dark code themes.
Pull, select, and uninstall Ollama models from the UI. The embedding model that powers search is protected from accidental removal.
Pin a question as a standing check. After every reindex it re-asks itself and flags you when the cited answer drifts — "did the auth flow change this sprint?" answers itself.
Pin good answers and export them as a markdown knowledge base generated from your own code — every claim keeping its file-and-line citations.
Background polling keeps status, repositories, and models current without refresh buttons, and the health indicator flashes the moment anything needs attention.
Trust layer
Search, file read, and task endpoints require shared-secret signatures, with separate secrets so one leak does not expose the whole stack.
Token-guarded sessions with a one-click Lock, a strict content-security policy, and a loopback guard keep the control plane local unless you explicitly unlock it.
Access tokens are generated server-side and rotated from the UI in one click — nobody types or chooses a credential, and rotation signs every other session out instantly.
Path-escape and symlink checks keep every read inside its repository, while file allowlists block binaries and unknown formats.
.env files, lockfiles, and dependency directories are excluded automatically so credentials never become searchable vectors.
Removing a repository requires typed confirmation and cleans up the working copy, vectors, and metadata.
Embeddings run on local Ollama, vectors live in local ChromaDB, and answers come from local models.
The retrieval engine is built natively on two auditable local services — ChromaDB for vectors, Ollama for models. No LangChain-style orchestration layer in between: a smaller attack surface, every line yours to audit.
Packages
Pricing scales with repositories and machines — never per seat, never per token. Every install is a one-time fee scoped to your setup, with SourceVault Care as an optional retainer for updates, reindex health, and tuning as local models improve.
One machine, two repositories, 30 days of support — and the full $500 is credited toward any package. If the pilot can't answer questions about your code with file-and-line citations, you don't pay.
From $1,350 one-time
30 days of Care included
For a founder or solo engineer who needs private code memory without a heavy platform project.
From $3,800 one-time
30 days of Care included
For a power user or small team running AI across several private repos.
From $8,200 one-time
30 days of Care included
For an engineering team or large monorepo that needs a deliberate rollout.
Custom
Priority Care SLA
For regulated or air-gapped environments that need the compliance story in writing.
Ask one question across every indexed repository, with citations tagged by repo. A one-time unlock on Starter and Pro — included with Team and Enterprise.
Updates, reindex health, support SLA, and tuning as local models improve — embedding upgrades require reindexing, and Care is who does it. Starter $275/mo, Pro $495/mo, Team from $1,100/mo, after the 30 days included with every install.
"From" prices are honest floors, not bait — your quote is fixed before work starts. Send your stack, repo count, and target machine and you will get a straight recommendation, even if the right answer is the smaller package.
Deployment
Everything runs on infrastructure you control — the models, the index, the dashboard. If you already build with local AI on WSL2 or Linux, SourceVault drops into the setup you have; if you don't, the done-for-you install brings everything it needs.
Start with the $500 pilot — fully credited toward any package, and backed by the citation guarantee: answers about your code, with file-and-line proof.
Book the pilot