Private GenAI Stack
Everything you could do with a private GenAI platform — LLMs, RAG, vision, agents, API tools — plus fleet orchestration for autonomous agent computers. Your infrastructure, your data, your control.
Helix started here
Before agent computers, before fleet orchestration, before any of that — Helix was a private GenAI stack. A way to run large language models, build intelligent agents, connect RAG pipelines, and process documents with vision models — all on your own infrastructure. No API keys to OpenAI. No data leaving your network. No vendor lock-in.
That hasn't changed. Everything the platform could do before, it still does. We've just added an entirely new layer on top.
What the private GenAI stack gives you
Large language models on your hardware
Run open-source LLMs — Llama, Mistral, DeepSeek, Qwen, and more — on your own GPUs. Helix handles model serving, GPU scheduling, and inference optimization. You choose the models. You control the versions. You own the weights.
- Multi-model support — run different models for different use cases, swap them without changing application code
- GPU scheduler — intelligent routing across available GPUs, automatic queuing, zero idle waste
- Quantization and optimization — run larger models on smaller hardware with minimal quality loss
- Air-gap ready — models run entirely offline, no outbound internet required
RAG pipeline — text and vision
Connect your knowledge base to your models. Helix includes a full retrieval-augmented generation pipeline with vector database, chunking, and retrieval — plus vision RAG for documents that aren't just text.
- Vector database — built-in, no external dependencies
- Document ingestion — PDF, Word, HTML, Markdown, plain text
- Vision RAG — extract structured data from scanned documents, images, charts, and complex layouts that defeat traditional OCR
- Automatic chunking and embedding — configurable strategies for different document types
- Source attribution — every answer traces back to the document and page it came from
Agent framework
Build AI agents that can use tools, call APIs, and accomplish multi-step tasks. Define agents with system prompts and API tool specs — the LLM figures out how to orchestrate them.
- API tool integration — connect any API with a Swagger/OpenAPI spec, no custom SDK needed
- Multi-turn conversations — agents maintain context across complex interactions
- Skills library — pre-built integrations for common enterprise tasks
- RAG as a tool — agents can query your knowledge base as part of their workflow
- Vision as a tool — agents can process images and documents as part of their reasoning
Evals and CI/CD for AI
Test your agents before they hit production. Helix includes an evaluation framework that lets you define test cases, run them automatically, and catch regressions before deployment.
- Automated test suites — define expected inputs and outputs, run on every change
- Regression detection — know immediately when a model swap or prompt change breaks something
- CI/CD integration — plug into your existing pipelines
- Quality metrics — track accuracy, latency, and cost across deployments
And now: agent computers on top
Everything above is the foundation. On top of it, Helix now offers something no other private GenAI stack does: agent computers.
Full GPU-accelerated desktop environments — browser, terminal, filesystem, GUI applications — where autonomous agents do real work. Not a chat interface. Not a terminal session. A complete computer per agent, fully isolated, running on your infrastructure.
This is the layer that turns a GenAI platform into a fleet orchestration system:
| Private GenAI stack | + Agent computers |
|---|---|
| Chat with LLMs | Agents that write code, run tests, deploy |
| RAG-powered Q&A | Agents that research, synthesize, produce documents |
| API tool agents | Agents with full desktop environments and GUI access |
| Single-user interactions | Fleet of 15+ agents working in parallel |
| Prompt-and-response | Spec coding → kanban pipeline → human review → merge |
You don't have to choose between a private GenAI platform and agent computers. Helix is both.
Why it matters that it's private
Every API call to a third-party AI service is a dependency you don't control. Every model update could break your workflows. Every rate limit could slow your business.
With Helix on your infrastructure:
- Your data never leaves your network — prompts, responses, documents, agent activity. All of it stays on hardware you control.
- No vendor lock-in — swap models freely, run multiple models simultaneously, never worry about an API being deprecated.
- Predictable costs — no per-token pricing surprises. Your hardware, your electricity, your budget.
- Compliance by default — SOC 2 Type II, ISO 27001 certified. HIPAA ready. Air-gap deployable for classified environments.
- No rate limits — your GPUs, your throughput. Scale by adding hardware, not by negotiating with a vendor.
Deployment options
On your Mac: The full Helix stack — LLMs, RAG, agents, and agent computers — running on Apple Silicon. $299/year. Start 24-hour free trial →
On Helix Cloud: Managed infrastructure, zero setup. Same capabilities, we handle the GPUs. Join the waitlist →
On your Kubernetes cluster: Enterprise deployment with RBAC, SSO, audit logging, and unlimited agents. Air-gap support available. Talk to us about an enterprise pilot →
From GenAI platform to fleet orchestration
The teams that deployed private GenAI stacks early got a head start. The teams that are adding agent computers to that stack now are compounding that advantage.
Helix gives you both layers in one platform — the private GenAI foundation your compliance team requires, and the agent computer fleet your engineering team is already asking for.