Omnistrate Deployment
This document outlines the process for deploying Experio to Kubernetes using the Omnistrate platform.What is Omnistrate
Omnistrate is a platform that simplifies deploying applications to Kubernetes. It uses yourdocker-compose.yaml file to build Docker images, push them to GitHub packages, and deploy them as a managed service on Kubernetes clusters. This eliminates the need to manually manage Kubernetes configurations while providing enterprise-grade deployment capabilities.
Installation
Install Omnistrate CLI
Install the Omnistrate CLI tool using the installation script:Authentication
Once installed, login to your Omnistrate account:How It Works
We do not useomctl build-from-repo. That command builds every image serially on one
machine (45+ min) and ignores the compose image: tag — it only ever pushes :latest and
:sha-<digest>, which makes it impossible to pin sidecars to a real version. Instead, CI
builds the images itself and hands Omnistrate a compose that references already-pushed,
version-pinned images:
- Build images in parallel - GitHub Actions fans out a matrix job, one runner per
Experio image, with per-image BuildKit layer caching (
type=gha). Wall-clock ≈ the slowest single image (the server), not the sum of all of them. - Push tags to GHCR - Each image is pushed as both
:${VERSION}(the semver fromVERSION, a human-facing label) and:sha-${git-sha}(unique per commit). - Generate the Omnistrate compose -
scripts/generate_omni_compose.pyconvertsdocker-compose.yaml→docker-compose.omni.yaml: it strips everybuild:block, replaces it with the pinnedimage:ref, and re-pins the server sidecars (scale-to-zero,flow-runner). It pins to the:sha-${git-sha}tag, not:${VERSION}— see Image Tagging for why the version tag is unsafe to pin. - Release the plan -
omctl build -f docker-compose.omni.yamlregisters a new service plan version that references those immutable image tags. - Deploy - the dev instance auto-upgrades; prod requires a manual
omctl upgrade.
docker-compose.yaml (with its build: blocks) is unchanged — docker compose up still builds locally. Only CI produces the derived docker-compose.omni.yaml
(gitignored, never committed).
Image naming
Images are named after the Dockerfile suffix, not the compose service name (preserving the namesbuild-from-repo historically used, so registry paths stay stable). For example
service downloader builds Dockerfile.download → ghcr.io/experio-ai/experio-dockerfile.download.
Core Deployment Commands
Standard Deployment (with Docker build)
CI normally does this. To reproduce a full release locally you would build & push each image to GHCR tagged with the commit SHA (see.github/workflows/omnistrate-build.yml for the
exact matrix), then generate the compose and release:
Fast Deployment (skip Docker build)
When the images for this commit already exist in GHCR (e.g. only spec/config changed), skip the build entirely and just re-release the plan:build(skip-docker)
commit): the matrix build job is skipped and only the release job runs.
File Dependencies
The Omnistrate deployment depends on several key files:Core Files
docker-compose.yaml- Main deployment specification defining all services, dependencies, and configurations.omnistrate.env- Environment variables for the deployment (contains database credentials, API keys, etc.)
Dockerfiles
CI builds and pushes one image per Dockerfile below (namedexperio-dockerfile.<suffix>):
Dockerfile.server- Django backend server with integrated client assetsDockerfile.migrations- DB migration / init job (compose serviceinitdb)Dockerfile.reader- File reading job serviceDockerfile.structured_data- Structured-data job serviceDockerfile.enrichment- Post-processing enrichment job serviceDockerfile.coordinator- Ingestion coordinator serviceDockerfile.download- File download job service (compose servicedownloader)Dockerfile.parser- Document parser job serviceDockerfile.classifier- Document classifier job serviceDockerfile.ingestion_v2- V2 ingestion job serviceDockerfile.cleanup- Cleanup job service
Supporting Files
certs/- SSL certificates directory (currently used for ManTech deployment compatibility)
CI/CD Integration
Automated deployments are configured using GitHub Actions in.github/workflows/omnistrate-build.yml. The workflow has three jobs:
compute-meta- Decides whether to run (workflow_dispatch, achore(release)commit onmain, or abuild(full)/build(skip-docker)keyword in the commit message), reads the semver fromVERSION, and computes the release description fromCHANGELOG.md.build-and-push- A matrix job (one runner per image) that builds eachDockerfile.*withdocker/build-push-action, caches layers viatype=gha, and pushes:${VERSION}+:sha-${git-sha}to GHCR. Skipped on the “skip docker build” path.deploy- Runsscripts/generate_omni_compose.py "sha-${git-sha}", thenomctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred, and finally auto-upgrades the dev instance. Prod requires a manualomctl upgrade.
staging, main (release commits), and feature/** / fix/** /
bugfix/** branches, plus manual workflow_dispatch. GHCR auth uses secrets.GH_PAT
(needs write:packages); Omnistrate auth uses secrets.OMNISTRATE_API_KEY.
Image Tagging
Floating:latest tags let instances silently drift when a node pulls a fresh image (and
build-from-repo only ever produced :latest / :sha). We avoid that by building the images
ourselves and pinning every Experio image — main containers and server sidecars — to the
immutable per-commit tag :sha-${git-sha}.
Why pin to the commit SHA, not VERSION
VERSION is mutable across rebuilds: rebuilding without bumping it re-pushes the same
:<version> tag in GHCR, overwriting the previous image. Kubernetes uses an IfNotPresent
pull policy, so a node that already cached :<version> will not re-pull — meaning a brand
new plan version can keep silently running the old image (observed as “new plan, unchanged
behavior / unchanged timestamp”). The :sha-${git-sha} tag is unique to every commit, so each
build is a distinct reference that always pulls. VERSION remains the human-facing release
label (release description, app runtime version), just not the thing the plan pins to.
How it works
- CI builds each
Dockerfile.*and pushesghcr.io/experio-ai/experio-dockerfile.<suffix>tagged with both:${VERSION}(label) and:sha-${git-sha}(the pinned, immutable tag). scripts/generate_omni_compose.py "sha-${git-sha}"producesdocker-compose.omni.yaml: it removes everybuild:block, setsimage:to the pinned:sha-…ref, and re-pins thescale-to-zero/flow-runnersidecars from:latestto the same tag. It uses a real YAML parser, so it cannot reintroduce the duplicate-image:-key error that broke an earlierenvsubst-based attempt (reverted PR #985).omctl buildrecords those immutable tags in a new plan version. Prod stays on its plan until an explicit upgrade.
Local / manual deploys
Build & push the images for the current commit to GHCR (see the workflow matrix), then pin to that commit’s SHA tag::latest for Experio-built images. Prod promotion is omctl upgrade <instance-id> --version=<plan-version> only.
Helper Scripts (Potentially Deprecated)
omnistrate-deploy.sh
Located atscripts/omnistrate-deploy.sh, this script provides an interactive deployment process that:
- Validates the docker-compose.yaml file
- Offers options to skip Docker builds
- Provides guided deployment with user prompts
omctl commands are sufficient for your workflow.
test-omni-deploy.sh
Located atscripts/test-omni-deploy.sh, this script:
- Tests the deployment configuration locally before sending to Omnistrate
- Provides options for fresh deployments (with volume deletion)
- Offers core-only testing to speed up validation
Future Enhancements
Several improvements are planned to simplify the deployment process:1. Remove .omnistrate.env Dependency
- Current: Uses
.omnistrate.envfile for environment variables - Future: Move all environment variables to GitHub Secrets for better security and management
- Benefit: Eliminates the need to manage sensitive credentials in repository files
2. Remove Certificate Dependencies
- Current: Includes
certs/folder with SSL certificates for ManTech deployment compatibility - Future: Remove certificate dependencies that were added specifically for ManTech
- Benefit: Simplifies deployment and removes ManTech-specific configurations
3. Clean Up Local Development Files
- Current: Includes various local Docker Compose files for testing/debugging
- Future: Remove
docker-compose.local.yamlanddocker-compose.local-web.yamlif they’re no longer needed - Benefit: Reduces confusion and maintenance overhead
Key Components
The Experio deployment consists of several containerized services:- Server: Django backend service with integrated client assets (
Dockerfile.server) - Reader: Job service for reading and processing files (
Dockerfile.reader) - Ingestion: Job service for downloading and processing files from external sources (
Dockerfile.ingestion) - PostgreSQL: Database
- Redis: Cache and session storage
- RabbitMQ: Message broker for job queues
- Neo4j: Graph database (default active provider; manual scale-to-zero)
- FalkorDB: Alternate graph database (manual scale-to-zero; off by default in cluster)
- JupyterLab: Data analysis and exploration environment
Docker Container Architecture
Container Structure
Server Container (Dockerfile.server)
The server container is built using a multi-stage approach:- Client Builder Stage: Builds frontend assets using Node.js
- Python Server Stage: Creates the Django backend and copies built client assets
Worker Containers (Dockerfile.reader & Dockerfile.ingestion)
Both worker services use similar lightweight Python containers:Volume Strategy
The deployment uses Docker volumes for:-
Persistent Data Storage:
experio_file_storage: Uploaded files and documentslocal_postgres_data: Database datalocal_redis_data: Cache datalocal_rabbitmq_data: Message queue datalocal_neo4j_data: Neo4j graph database datalocal_falkordb_data: FalkorDB graph database data
-
Certificate Management:
./certs:/certs:ro: SSL certificates (read-only)
Graph databases (Neo4j & FalkorDB)
Both graph backends are deployed as internal Omnistrate services with manual scale-to-zero (same pattern as JupyterLab). They are not required for platform boot — only for chat, ingestion graph writes, and admin graph tools.| Service | Default scaling mode | Default replicas | Purpose |
|---|---|---|---|
neo4j | keep_alive | 1 | Active graph provider (GRAPH_PROVIDER=neo4j) |
falkordb | keep_down | 0 | Standby / migration target |
Service Startup Order
The deployment enforces startup dependencies for platform services only. Graph databases are intentionally not in the critical path:- Core infrastructure (PostgreSQL, Redis, RabbitMQ) start with health checks
- Database initialization (
initdb) runs after PostgreSQL is healthy — needs Postgres only (seedsNEO4J_*/FALKOR_*config from secrets; does not connect to graph DBs) - Application server starts after
initdbcompletes and core infra is healthy - Worker services (reader, ingestion pipeline, etc.) start after RabbitMQ and
initdb; they tolerate a cold graph and reconnect when the active provider is up - Graph DBs (
neo4j,falkordb) scale independently via manual scale-to-zero
GRAPH_PROVIDER)
to be running. The rest of the platform (auth, admin UI, Postgres-backed settings)
starts without either graph DB online.
Authentication Setup
The deployment includes automatic creation of default accounts during theinitdb job:
Default Accounts
-
Admin User
- Email:
admin@experiolabs.ai - Password: Value from
DJANGO_SUPERUSER_PASSWORDenvironment variable - Has full superuser privileges
- Email:
-
Test User
- Email:
playwright@thinknimble.com - Password: Value from
PLAYWRIGHT_TEST_USER_PASSenvironment variable - Used by automated tests
- Email:
Omnistrate secrets
Graph credentials are injected via Omnistrate service secrets referenced indocker-compose.yaml. Add these in the Omnistrate dashboard (Secrets) before
releasing a new service API version:
| Secret key | Used by | Purpose |
|---|---|---|
neo4jPassword | neo4j, initdb | Neo4j auth and seeded NEO4J_PASSWORD |
falkordbPassword | falkordb, initdb | FalkorDB auth and seeded FALKOR_PASSWORD |
falkordbPassword is missing, Omnistrate release validation fails with an error
referencing the initdb component.
Other existing secrets (dbPassword, rabbitmqPassword, etc.) are unchanged.
Environment Configuration
The deployment uses environment variables defined in.omnistrate.env. Key configurations include:
Deployment Checklist
Before deploying to Omnistrate:- Install and authenticate: Ensure
omctlis installed and you’re logged in - Add secrets: Ensure
falkordbPassword(and other required secrets) exist in Omnistrate - Update configurations: Modify
docker-compose.yamland.omnistrate.envas needed - Choose deployment type: Decide whether to rebuild Docker images or skip the build
- Run deployment: Execute the appropriate
omctl build-from-repocommand - Monitor deployment: Check the Omnistrate UI for deployment status and any issues
- Post-deploy (existing instances): Run once after upgrade:
Health endpoints
| Endpoint | Purpose |
|---|---|
GET /health/ | Comprehensive check. Critical: Postgres, Redis, RabbitMQ, LLM. Non-critical: active graph DB (checks.neo4j), MCP. Returns degraded (HTTP 200) when platform is up but graph is down (scale-to-zero, warm-up, migration). Returns unhealthy (HTTP 503) only when a critical dependency fails. |
GET /health/simple/ | Process liveness only — always 200 when Django is running |
checks.neo4j.status: unhealthy alone as a platform outage when graph
is intentionally scaled down.
Troubleshooting
Common Deployment Issues
-
Authentication Failed
- Ensure you’re logged in:
omctl login - Check your Omnistrate account credentials
- Ensure you’re logged in:
-
Invalid Docker Compose Configuration
- Validate locally:
docker compose config - Check for YAML syntax errors
- Validate locally:
-
Build Failures
- Review build logs in the Omnistrate UI
- Test Docker builds locally:
docker build -f Dockerfile.server . - Missing secret references: e.g.
falkordbPasswordnot defined in Omnistrate Secrets
-
Service Health Check Failures
- Check health check configurations in
docker-compose.yaml - Verify service dependencies are correctly defined
- Check health check configurations in
Useful Commands
Docker-Related Issues
Container Startup Failures
If containers fail to start:- Check logs in Omnistrate UI: Navigate to your deployment and examine container logs
- Verify health checks: Ensure health check commands are appropriate for your services
- Check dependencies: Verify
depends_onconfigurations match your service requirements - Volume permissions: Ensure mounted volumes have appropriate permissions
Image Build Problems
If Docker image builds fail:- Review build logs: Check the build process output in Omnistrate UI
- Test locally: Run
docker buildcommands locally to identify issues - Check base images: Ensure base images in Dockerfiles are accessible
- Verify file paths: Ensure all COPY commands reference existing files
Getting Help
- Omnistrate Documentation: Check the official Omnistrate documentation
- GitHub Issues: Review the repository’s issue tracker for known problems
- Omnistrate Support: Contact Omnistrate support for platform-specific issues