Measured: 100% cited, 100% correct answers on the public Express benchmark — see the numbers

Private code memory for AI

Give your AI grounded answers over private repositories, without uploading source code.

SourceVault keeps every line on infrastructure you control — laptop, workstation, or your own server — indexes it with code-aware retrieval, and gives your team a safer way to search, read, and reason about proprietary code.

What buyers get

  • A dashboard with cited answers, a full source viewer, watches, and runbook export.
  • Hermes plugin plus an MCP server for Claude Code and any MCP client.
  • Git-history answers and module overviews — coverage cloud indexers never see.
  • A retrieval-quality report measured on your repositories, benchmark-style.
  • Security hardening: token-guarded access, repo confinement, secrets never indexed.
  • One-time implementation with live handoff — GitHub, GitLab, and Bitbucket supported.

See it work

Ask a question, get a cited answer, click straight to the source.

The problem

Hosted AI tools break down when the code is private.

Most code assistants want to upload your repository, keep it in a cloud context window, and answer from a lossy slice of your system. That creates privacy risk, compliance concerns, and weak answers when the repo is large or sensitive.

SourceVault is built for teams that need private code intelligence on local infrastructure.

How it works

Local index, exact reads, grounded answers.

The retrieval patterns the leading RAG frameworks document as best practice — implemented natively on ChromaDB and Ollama, so the stack stays small and every line is yours to audit.

01

Hybrid retrieval

Each query combines semantic search and exact keyword matching, then fuses the rankings so the strongest results rise to the top.

02

Code-aware chunking

Files are split along function and class boundaries, so every result maps to readable code with exact line ranges.

03

Context-aware embeddings

Each chunk is embedded with its path and symbol names so the vector store knows where code lives, not just what it says.

04

Grounded answers

Ask mode checks whether it has enough context, retrieves again if needed, and answers only from source-backed snippets instead of guessing.

Cost model

AI burns tokens. Retrieval is how you stop paying for it.

Watch an AI agent work on a real codebase and you will see where the money goes: it re-reads whole files to find one function, drags stale context forward turn after turn, and pays again for every retry. That burn compounds — per question, per developer, per day — and per-seat plans meter all of it.

SourceVault's retrieval engine exists to end that pattern. Every question gets a bounded budget of cited file and line ranges — never a repo dump — and repeated questions return from cache with zero model calls. Local models answer with no meter at all; when you do route an agent to a hosted model, it receives targeted snippets, so the tokens you pay for are the ones that matter.

Per-seat feesNone — runs for the whole team on one machine you control
Per-token billingNone for local answers; repeated questions return from cache with zero model calls
Context per answerBounded budget (≈6k tokens by default) of cited file and line ranges — never a repo dump
Source code egressNone — nothing uploaded, logged, or retained by a third party

Operator experience

Fast, deterministic commands for private code.

Slash commands behave the same way every time in Hermes CLI, Telegram, and Hermes Desktop — easy to teach, support, and repeat. Just ask your question; the engine retrieves on its own.

The same engine is available over MCP to Claude Code, OpenClaw, and any MCP client, so every AI tool you run answers from the same private, cited index.

$/code-search express "trust proxy client ip"
$/code-read express lib/request.js
$/code-ask express "how does the trust proxy setting affect the client IP?"

Code intelligence

More than search — memory that understands and watches your code.

The retrieval engine goes past keyword matching: it knows how your code connects, keeps its answers honest as the code changes, and plugs into the AI tools you already use.

Symbol-aware retrieval

A symbol graph links definitions to their references, so "who calls this function?" is answered from the code's actual structure — not just text that looks similar.

Citations that age honestly

If a cited file changed since it was indexed, the citation says so. The answer never quietly points you at lines that moved.

Incremental indexing

Only changed files re-embed, so keeping a large repository current is fast — and the nightly refresh stays cheap.

Answer quality, measured

Rate answers and a built-in eval harness replays them after any model or index change, reporting drift — the backbone of ongoing retrieval tuning.

Multi-repo AskUpgrade

Ask one question across every indexed repository at once, with each citation tagged by repo — "how do the frontend and backend handle this?" in a single answer.

Git history answers

Commit messages index alongside code, so "why was this changed?" and "when did this break?" get grounded answers — coverage a cloud indexer never sees, because it never sees your history.

Module overviews

A generated repo map and per-module summaries answer "how does auth work overall" with an overview instead of fragments — written by your local model, from your code, on your machine.

Cross-encoder precision

A local reranker re-reads the top candidates against your actual question before answering. Adopted because the benchmark proved it: higher file-hit and citation precision at sub-second cost.

Works with your AI tools

An MCP server exposes the same engine to Claude Code, OpenClaw, and any MCP client — bounded, cited context instead of re-reading files. The server itself is local-only: pair it with a local-model client and the loop stays zero-egress end to end; a cloud-backed client sends what it retrieves to its own vendor, by your choice.

The dashboard

A control plane your whole team can use.

Everything ships with a browser dashboard — connect your source control platforms, manage repositories and models, and ask questions about your code without touching a terminal. See it in action in the walkthrough above.

Connect your source control

Sign in to GitHub, GitLab, or Bitbucket once. Browse and autocomplete your repositories as you type, and clone private repos without per-clone credentials.

One-click indexing

Repositories index automatically on import. Update, sync, or switch branches per repo — and a stale index is one click from fresh.

Search and Ask

Literal and semantic search with file-type filters, plus Ask mode for grounded answers where every citation clicks open to its source. History and archive are built in.

Built-in source viewer

Citations and search results open the full file in a syntax-highlighted viewer — cited lines marked and scrolled into view, 15 languages, selectable light and dark code themes.

Local model manager

Pull, select, and uninstall Ollama models from the UI. The embedding model that powers search is protected from accidental removal.

Watches

Pin a question as a standing check. After every reindex it re-asks itself and flags you when the cited answer drifts — "did the auth flow change this sprint?" answers itself.

Runbook export

Pin good answers and export them as a markdown knowledge base generated from your own code — every claim keeping its file-and-line citations.

Self-monitoring

Background polling keeps status, repositories, and models current without refresh buttons, and the health indicator flashes the moment anything needs attention.

Trust layer

Security-first features for the modern development team.

HMAC-signed APIs

Search, file read, and task endpoints require shared-secret signatures, with separate secrets so one leak does not expose the whole stack.

Locked-down dashboard

Token-guarded sessions with a one-click Lock, a strict content-security policy, and a loopback guard keep the control plane local unless you explicitly unlock it.

Tokens, never passwords

Access tokens are generated server-side and rotated from the UI in one click — nobody types or chooses a credential, and rotation signs every other session out instantly.

Repo confinement

Path-escape and symlink checks keep every read inside its repository, while file allowlists block binaries and unknown formats.

Secrets never indexed

.env files, lockfiles, and dependency directories are excluded automatically so credentials never become searchable vectors.

Deliberate deletion

Removing a repository requires typed confirmation and cleans up the working copy, vectors, and metadata.

100% local by design

Embeddings run on local Ollama, vectors live in local ChromaDB, and answers come from local models.

No framework supply chain

The retrieval engine is built natively on two auditable local services — ChromaDB for vectors, Ollama for models. No LangChain-style orchestration layer in between: a smaller attack surface, every line yours to audit.

Packages

Pricing that fits your setup, not the other way around.

Pricing scales with repositories and machines — never per seat, never per token. Every install is a one-time fee scoped to your setup, with SourceVault Care as an optional retainer for updates, reindex health, and tuning as local models improve.

Start with a $500 pilot

One machine, two repositories, 30 days of support — and the full $500 is credited toward any package. If the pilot can't answer questions about your code with file-and-line citations, you don't pay.

Book a pilot

Starter

From $1,350 one-time

30 days of Care included

For a founder or solo engineer who needs private code memory without a heavy platform project.

  • 1 WSL/Linux machine
  • 1–3 repositories indexed
  • Dashboard, Hermes plugin, MCP server
  • Live handoff session and runbook
  • Retrieval-quality report on your code
Start with Starter

Team

From $8,200 one-time

30 days of Care included

For an engineering team or large monorepo that needs a deliberate rollout.

  • Multi-repo Ask included
  • Up to 4 machines, shared team setup
  • 15+ repositories / large-repo indexing strategy
  • Security-focused file exclusions
  • Team onboarding and maintenance runbook
Start with Team

Enterprise

Custom

Priority Care SLA

For regulated or air-gapped environments that need the compliance story in writing.

  • Multi-repo Ask included
  • Air-gapped install: offline bundle and models
  • Append-only audit log of asks and file reads
  • Compliance pack with zero-egress verification
  • Multi-machine rollout, scoped to your environment
Get a fixed quote

Add-ons

Multi-repo Ask $900 one-time

Ask one question across every indexed repository, with citations tagged by repo. A one-time unlock on Starter and Pro — included with Team and Enterprise.

Add it on

SourceVault Care Retainer

Updates, reindex health, support SLA, and tuning as local models improve — embedding upgrades require reindexing, and Care is who does it. Starter $275/mo, Pro $495/mo, Team from $1,100/mo, after the 30 days included with every install.

Add Care

"From" prices are honest floors, not bait — your quote is fixed before work starts. Send your stack, repo count, and target machine and you will get a straight recommendation, even if the right answer is the smaller package.

Deployment

Built first for WSL2 Ubuntu and Linux.

Everything runs on infrastructure you control — the models, the index, the dashboard. If you already build with local AI on WSL2 or Linux, SourceVault drops into the setup you have; if you don't, the done-for-you install brings everything it needs.

Minimum8 cores, 24GB RAM, 100GB SSD
Recommended12 cores, 32–64GB RAM, 250GB NVMe
Default embeddingnomic-embed-text
Default reasoningqwen3-coder:30b

Make private code searchable without giving it away.

Start with the $500 pilot — fully credited toward any package, and backed by the citation guarantee: answers about your code, with file-and-line proof.

Book the pilot