Add RAG knowledge bases
Attach documents, web content, and data sources to an agent app so it can answer questions grounded in your content.
A knowledge base is a collection of documents that Helix indexes and searches at query time. When the agent receives a message, Helix finds the most relevant chunks and injects them into the context. This is Retrieval-Augmented Generation (RAG).
This guide covers knowledge bases for agent apps (the chatbot builder). For indexing codebases used by spec tasks, see Build an internal knowledge base.
Add a knowledge source
In your agent app, open the Knowledge tab and click Add Knowledge. Give it a name and description — the description helps the agent decide when to query this source versus another.
Each agent app can have multiple knowledge bases. The agent queries each one independently and combines the results.
Source types
Web URLs
Index one or more public URLs. Helix fetches and chunks the page content.
source:
web:
urls:
- https://docs.example.com/
- https://blog.example.com/release-notesEnable Crawler to follow links recursively:
source:
web:
urls:
- https://docs.example.com/
crawler:
enabled: true
max_depth: 3 # link depth from the seed URL
max_pages: 200 # hard cap on pages fetched
readability: true # strip nav/footer noise, keep article bodyFor password-protected sites:
source:
web:
urls:
- https://intranet.example.com/
auth:
username: ${INTRANET_USER}
password: ${INTRANET_PASS}File uploads
Drag-and-drop or browse files in the Knowledge tab. Supported formats: PDF, DOCX, PPTX, XLSX, CSV, Markdown, plain text.
Uploaded files go into the agent's Helix Drive storage and are indexed immediately.
Helix Drive path
Reference a folder already in Helix Drive (the built-in file storage):
source:
helix_drive:
path: /my-team/product-docsS3
source:
s3:
bucket: my-company-docs
path: knowledge-base/ # prefix, optionalHelix uses the AWS credentials configured in your installation. For Helix Cloud, contact support to configure S3 access.
Google Cloud Storage
source:
gcs:
bucket: my-company-docs
path: knowledge-base/SharePoint
source:
sharepoint:
site_id: <sharepoint-site-id>
drive_id: <drive-id> # optional; defaults to the default drive
folder_path: /Documentation # optional
oauth_provider_id: <your-sharepoint-oauth-provider>
filter_extensions:
- .pdf
- .docx
recursive: trueSharePoint requires an OAuth provider configured under Organisation → OAuth Connections.
Inline text
For small, stable content that doesn't need an external source:
source:
text: |
Our return policy: items may be returned within 30 days
with a receipt for a full refund...Vision RAG
By default, Helix indexes the text content of documents. Enable Vision to also index images and scanned PDFs using a multimodal embedding model:
rag_settings:
enable_vision: trueWith vision enabled, the agent can answer questions about diagrams, screenshots, charts, and scanned pages that contain no machine-readable text.
Vision indexing is slower and costs more tokens than text-only indexing. Use it when your documents contain meaningful visual content.
Refresh schedule
Keep the index current with a cron schedule:
refresh_enabled: true
refresh_schedule: "0 */6 * * *" # every 6 hoursStandard cron syntax. The agent serves the previous index while the refresh runs; there is no downtime.
Tuning RAG retrieval
Under Advanced Settings in the Knowledge editor:
| Setting | Default | Effect |
|---|---|---|
| Results count | 4 | Number of chunks retrieved per query. More chunks = more context but higher cost. |
| Chunk size | 1024 | Maximum tokens per chunk when splitting documents. Smaller = more precise retrieval; larger = more context per result. |
| Chunk overflow | 64 | Token overlap between adjacent chunks, to avoid splitting mid-sentence. |
Start with the defaults and adjust if the agent gives answers that seem out of context (increase results count) or too verbose (reduce chunk size).
Monitoring index state
Each knowledge source shows its current state:
- Indexing — currently being processed
- Ready — indexed and searchable
- Error — indexing failed (check the message for details)
You can trigger a manual re-index at any time with the Re-index button.
Multiple knowledge bases
One agent can have multiple knowledge bases. During a conversation, Helix queries all of them and combines the results. Use separate knowledge bases when you have logically distinct content that the agent should search independently — for example, a product manual and a customer FAQ.