HelixML

Kodit (Open-Source)

Code and Document Indexing Server

This page is sourced from the Kodit GitHub repository.

AI coding assistants work better when they have access to real examples from your codebase. Kodit indexes your repositories, splits source files into searchable snippets, and serves them to any MCP-compatible assistant. When your assistant needs to write new code, it queries Kodit first and gets back relevant, up-to-date examples drawn from your own projects.

Kodit also handles documents. PDFs, Word files, PowerPoint decks, and spreadsheets are rasterized and indexed so you can search across both code and documentation in one place.

What you get:

  • Multiple search strategies including BM25 keyword search, semantic vector search, regex grep, and visual document search, each exposed as a separate MCP tool so your assistant picks the right approach for each query
  • MCP server that works with Claude Code, Cursor, Cline, Kilo Code, and any other MCP-compatible assistant
  • REST API for programmatic access to search, repositories, enrichments, and indexing status
  • AI enrichments (optional) including architecture docs, API docs, database schema detection, cookbook examples, and commit summaries, all generated by an LLM
  • Document intelligence with visual search across PDF pages, Office documents, and images using multimodal embeddings
  • No external dependencies required for basic operation, with a built-in embedding model and SQLite storage

Quickstart

docker run -p 8080:8080 registry.helix.ml/helix/kodit:latest

This starts Kodit with SQLite storage and a built-in embedding model. No API keys needed.

Pre-built binaries

Download a binary from the releases page, then:

chmod +x kodit
./kodit serve

Verify it works

Open the interactive API docs at http://localhost:8080/docs.

Or index a small repository and run a search:

# Index a repository
curl http://localhost:8080/api/v1/repositories \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "repository",
      "attributes": {
        "remote_uri": "https://gist.github.com/philwinder/7aa38185e20433c04c533f2b28f4e217.git"
      }
    }
  }'
 
# Check indexing progress
curl http://localhost:8080/api/v1/repositories/1/status
 
# Search (once indexing is complete)
curl http://localhost:8080/api/v1/search \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "search",
      "attributes": {
        "keywords": ["orders"],
        "text": "code to get all orders"
      }
    }
  }'

Connecting to AI Assistants

Kodit exposes an MCP endpoint at /mcp. Connect your assistant to start using Kodit as a code search tool.

Claude Code

claude mcp add --transport http kodit http://localhost:8080/mcp

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "kodit": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Cline

Add to the MCP Servers configuration (Remote Servers tab):

{
  "mcpServers": {
    "kodit": {
      "autoApprove": [],
      "disabled": false,
      "timeout": 60,
      "type": "streamableHttp",
      "url": "http://localhost:8080/mcp"
    }
  }
}

Kilo Code

Add to the MCP configuration (Edit Project/Global MCP):

{
  "mcpServers": {
    "kodit": {
      "type": "streamable-http",
      "url": "http://localhost:8080/mcp",
      "alwaysAllow": [],
      "disabled": false
    }
  }
}

Replace http://localhost:8080 with your server URL if running remotely.

Encouraging assistants to use Kodit

Some assistants may not call Kodit tools automatically. Add this to your project rules or system prompt to enforce usage:

For every request that involves writing or modifying code, the assistant's first
action must be to call the kodit search MCP tools. Only produce or edit code after
the tool call returns results.

In Cursor, save this as .cursor/rules/kodit.mdc with alwaysApply: true frontmatter.

MCP Tools

Kodit exposes these tools to connected AI assistants:

ToolDescription
kodit_repositoriesList all indexed repositories
kodit_semantic_searchSemantic similarity search across code
kodit_keyword_searchBM25 keyword search
kodit_visual_searchSearch document page images
kodit_grepRegex pattern matching
kodit_lsList files by glob pattern
kodit_read_resourceRead file content by URI
kodit_architecture_docsArchitecture documentation for a repo
kodit_api_docsPublic API documentation
kodit_database_schemaDatabase schema documentation
kodit_cookbookUsage examples and patterns
kodit_commit_descriptionCommit description
kodit_wikiWiki table of contents
kodit_wiki_pageRead a specific wiki page
kodit_versionServer version

The enrichment tools (architecture_docs, api_docs, database_schema, cookbook, wiki, commit_description) require an LLM provider to be configured. See Enrichment Providers.

Go Library

Kodit can be embedded directly as a Go library. This is how Helix integrates Kodit into its platform.

import "github.com/helixml/kodit"
 
client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
)
if err != nil {
    log.Fatal(err)
}
defer client.Close()
 
// Index a repository
repo, err := client.Repositories.Add(ctx, &service.RepositoryAddParams{
    URL: "https://github.com/kubernetes/kubernetes",
})
 
// Search
results, err := client.Search.Query(ctx, "create a deployment",
    service.WithLimit(10),
)
 
for _, snippet := range results.Snippets() {
    fmt.Println(snippet.Path(), snippet.Name())
}

Library options

OptionDescription
WithSQLite(path)Use SQLite for storage
WithPostgresVectorchord(dsn)Use PostgreSQL with VectorChord
WithOpenAI(apiKey)OpenAI for embeddings and text
WithAnthropic(apiKey)Anthropic Claude for text (needs separate embedding provider)
WithTextProvider(p)Custom text generation provider
WithEmbeddingProvider(p)Custom embedding provider
WithRAGPipeline()Skip LLM enrichments, index and search only
WithFullPipeline()Require all enrichments (errors without a text provider)
WithDataDir(dir)Data directory (default: ~/.kodit)
WithCloneDir(dir)Repository clone directory
WithAPIKeys(keys...)API keys for HTTP authentication
WithWorkerCount(n)Number of background workers (default: 1)
WithPeriodicSyncConfig(cfg)Automatic repository sync settings

Search options

OptionDescription
WithSemanticWeight(w)Weight for semantic vs keyword search (0.0 to 1.0)
WithLimit(n)Maximum number of results
WithOffset(n)Offset for pagination
WithLanguages(langs...)Filter by programming languages
WithRepositories(ids...)Filter by repository IDs
WithMinScore(score)Minimum score threshold

Go HTTP client

A generated HTTP client is available for calling a remote Kodit server from Go:

go get github.com/helixml/kodit/clients/go
import koditclient "github.com/helixml/kodit/clients/go"
 
client, err := koditclient.NewClient("https://kodit.example.com")
 
// List repositories
resp, err := client.GetApiV1Repositories(ctx)
 
// Search
resp, err := client.PostApiV1SearchMulti(ctx, koditclient.PostApiV1SearchMultiJSONRequestBody{
    TextQuery: "create a deployment",
    TopK:      10,
})

Types are auto-generated from the OpenAPI spec. See the interactive API docs at /docs for the full endpoint list.

Production Deployment

For production use, deploy with PostgreSQL (VectorChord) for scalable vector search and a dedicated LLM provider for enrichments.

Docker Compose

Save this as docker-compose.yaml:

services:
  kodit:
    image: registry.helix.ml/helix/kodit:latest
    ports:
      - "8080:8080"
    command: ["serve", "--host", "0.0.0.0", "--port", "8080"]
    restart: unless-stopped
    depends_on:
      - vectorchord
    environment:
      DATA_DIR: /data
      DB_URL: postgresql://postgres:mysecretpassword@vectorchord:5432/kodit
 
      # Enrichment LLM (optional, enables AI-generated docs)
      ENRICHMENT_ENDPOINT_BASE_URL: http://ollama:11434
      ENRICHMENT_ENDPOINT_MODEL: ollama/qwen3:1.7b
 
      # External embedding provider (optional, replaces built-in model)
      # EMBEDDING_ENDPOINT_API_KEY: sk-proj-xxxx
      # EMBEDDING_ENDPOINT_MODEL: openai/text-embedding-3-small
 
      LOG_LEVEL: INFO
      API_KEYS: ${KODIT_API_KEYS:-}
    volumes:
      - kodit-data:/data
 
  vectorchord:
    image: tensorchord/vchord-suite:pg17-20250601
    environment:
      POSTGRES_DB: kodit
      POSTGRES_PASSWORD: mysecretpassword
    volumes:
      - vectorchord-data:/var/lib/postgresql/data
    restart: unless-stopped
 
volumes:
  kodit-data:
  vectorchord-data:

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectorchord
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vectorchord
  template:
    metadata:
      labels:
        app: vectorchord
    spec:
      containers:
        - name: vectorchord
          image: tensorchord/vchord-suite:pg17-20250601
          env:
            - name: POSTGRES_DB
              value: kodit
            - name: POSTGRES_PASSWORD
              value: mysecretpassword
          ports:
            - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: vectorchord
spec:
  selector:
    app: vectorchord
  ports:
    - port: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kodit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kodit
  template:
    metadata:
      labels:
        app: kodit
    spec:
      containers:
        - name: kodit
          image: registry.helix.ml/helix/kodit:latest # pin to a specific version
          args: ["serve", "--host", "0.0.0.0", "--port", "8080"]
          env: [] # see Configuration Reference for environment variables
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: kodit
spec:
  type: LoadBalancer
  selector:
    app: kodit
  ports:
    - port: 8080

Authentication