Skip to main content

Omnistrate Deployment

This document outlines the process for deploying Experio to Kubernetes using the Omnistrate platform.

What is Omnistrate

Omnistrate is a platform that simplifies deploying applications to Kubernetes. It uses your docker-compose.yaml file to build Docker images, push them to GitHub packages, and deploy them as a managed service on Kubernetes clusters. This eliminates the need to manually manage Kubernetes configurations while providing enterprise-grade deployment capabilities.

Installation

Install Omnistrate CLI

Install the Omnistrate CLI tool using the installation script:
curl -fsSL https://raw.githubusercontent.com/omnistrate/cli/master/install-ctl.sh | sh

Authentication

Once installed, login to your Omnistrate account:
omctl login
You’ll be prompted to enter your Omnistrate credentials.

How It Works

We do not use omctl build-from-repo. That command builds every image serially on one machine (45+ min) and ignores the compose image: tag — it only ever pushes :latest and :sha-<digest>, which makes it impossible to pin sidecars to a real version. Instead, CI builds the images itself and hands Omnistrate a compose that references already-pushed, version-pinned images:
  1. Build images in parallel - GitHub Actions fans out a matrix job, one runner per Experio image, with per-image BuildKit layer caching (type=gha). Wall-clock ≈ the slowest single image (the server), not the sum of all of them.
  2. Push tags to GHCR - Each image is pushed as both :${VERSION} (the semver from VERSION, a human-facing label) and :sha-${git-sha} (unique per commit).
  3. Generate the Omnistrate compose - scripts/generate_omni_compose.py converts docker-compose.yamldocker-compose.omni.yaml: it strips every build: block, replaces it with the pinned image: ref, and re-pins the server sidecars (scale-to-zero, flow-runner). It pins to the :sha-${git-sha} tag, not :${VERSION} — see Image Tagging for why the version tag is unsafe to pin.
  4. Release the plan - omctl build -f docker-compose.omni.yaml registers a new service plan version that references those immutable image tags.
  5. Deploy - the dev instance auto-upgrades; prod requires a manual omctl upgrade.
The local-dev docker-compose.yaml (with its build: blocks) is unchangeddocker compose up still builds locally. Only CI produces the derived docker-compose.omni.yaml (gitignored, never committed).

Image naming

Images are named after the Dockerfile suffix, not the compose service name (preserving the names build-from-repo historically used, so registry paths stay stable). For example service downloader builds Dockerfile.downloadghcr.io/experio-ai/experio-dockerfile.download.

Core Deployment Commands

Standard Deployment (with Docker build)

CI normally does this. To reproduce a full release locally you would build & push each image to GHCR tagged with the commit SHA (see .github/workflows/omnistrate-build.yml for the exact matrix), then generate the compose and release:
TAG="sha-$(git rev-parse HEAD)"
# ... docker build + push each Dockerfile.* to ghcr.io/experio-ai/experio-dockerfile.* :$TAG ...
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred

Fast Deployment (skip Docker build)

When the images for this commit already exist in GHCR (e.g. only spec/config changed), skip the build entirely and just re-release the plan:
TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred
In CI this is the “skip docker build” path (dispatch input or a build(skip-docker) commit): the matrix build job is skipped and only the release job runs.

File Dependencies

The Omnistrate deployment depends on several key files:

Core Files

  • docker-compose.yaml - Main deployment specification defining all services, dependencies, and configurations
  • .omnistrate.env - Environment variables for the deployment (contains database credentials, API keys, etc.)

Dockerfiles

CI builds and pushes one image per Dockerfile below (named experio-dockerfile.<suffix>):
  • Dockerfile.server - Django backend server with integrated client assets
  • Dockerfile.migrations - DB migration / init job (compose service initdb)
  • Dockerfile.reader - File reading job service
  • Dockerfile.structured_data - Structured-data job service
  • Dockerfile.enrichment - Post-processing enrichment job service
  • Dockerfile.coordinator - Ingestion coordinator service
  • Dockerfile.download - File download job service (compose service downloader)
  • Dockerfile.parser - Document parser job service
  • Dockerfile.classifier - Document classifier job service
  • Dockerfile.ingestion_v2 - V2 ingestion job service
  • Dockerfile.cleanup - Cleanup job service

Supporting Files

  • certs/ - SSL certificates directory (currently used for ManTech deployment compatibility)

CI/CD Integration

Automated deployments are configured using GitHub Actions in .github/workflows/omnistrate-build.yml. The workflow has three jobs:
  1. compute-meta - Decides whether to run (workflow_dispatch, a chore(release) commit on main, or a build(full) / build(skip-docker) keyword in the commit message), reads the semver from VERSION, and computes the release description from CHANGELOG.md.
  2. build-and-push - A matrix job (one runner per image) that builds each Dockerfile.* with docker/build-push-action, caches layers via type=gha, and pushes :${VERSION} + :sha-${git-sha} to GHCR. Skipped on the “skip docker build” path.
  3. deploy - Runs scripts/generate_omni_compose.py "sha-${git-sha}", then omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred, and finally auto-upgrades the dev instance. Prod requires a manual omctl upgrade.
Triggers: pushes to staging, main (release commits), and feature/** / fix/** / bugfix/** branches, plus manual workflow_dispatch. GHCR auth uses secrets.GH_PAT (needs write:packages); Omnistrate auth uses secrets.OMNISTRATE_API_KEY.

Image Tagging

Floating :latest tags let instances silently drift when a node pulls a fresh image (and build-from-repo only ever produced :latest / :sha). We avoid that by building the images ourselves and pinning every Experio image — main containers and server sidecars — to the immutable per-commit tag :sha-${git-sha}.

Why pin to the commit SHA, not VERSION

VERSION is mutable across rebuilds: rebuilding without bumping it re-pushes the same :<version> tag in GHCR, overwriting the previous image. Kubernetes uses an IfNotPresent pull policy, so a node that already cached :<version> will not re-pull — meaning a brand new plan version can keep silently running the old image (observed as “new plan, unchanged behavior / unchanged timestamp”). The :sha-${git-sha} tag is unique to every commit, so each build is a distinct reference that always pulls. VERSION remains the human-facing release label (release description, app runtime version), just not the thing the plan pins to.

How it works

  1. CI builds each Dockerfile.* and pushes ghcr.io/experio-ai/experio-dockerfile.<suffix> tagged with both :${VERSION} (label) and :sha-${git-sha} (the pinned, immutable tag).
  2. scripts/generate_omni_compose.py "sha-${git-sha}" produces docker-compose.omni.yaml: it removes every build: block, sets image: to the pinned :sha-… ref, and re-pins the scale-to-zero / flow-runner sidecars from :latest to the same tag. It uses a real YAML parser, so it cannot reintroduce the duplicate-image:-key error that broke an earlier envsubst-based attempt (reverted PR #985).
  3. omctl build records those immutable tags in a new plan version. Prod stays on its plan until an explicit upgrade.

Local / manual deploys

Build & push the images for the current commit to GHCR (see the workflow matrix), then pin to that commit’s SHA tag:
TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred
Do not use :latest for Experio-built images. Prod promotion is omctl upgrade <instance-id> --version=<plan-version> only.

Helper Scripts (Potentially Deprecated)

omnistrate-deploy.sh

Located at scripts/omnistrate-deploy.sh, this script provides an interactive deployment process that:
  • Validates the docker-compose.yaml file
  • Offers options to skip Docker builds
  • Provides guided deployment with user prompts
Status: This script can probably be deleted if the simple omctl commands are sufficient for your workflow.

test-omni-deploy.sh

Located at scripts/test-omni-deploy.sh, this script:
  • Tests the deployment configuration locally before sending to Omnistrate
  • Provides options for fresh deployments (with volume deletion)
  • Offers core-only testing to speed up validation
Status: This script can probably be deleted if local testing is not needed before Omnistrate deployments.

Future Enhancements

Several improvements are planned to simplify the deployment process:

1. Remove .omnistrate.env Dependency

  • Current: Uses .omnistrate.env file for environment variables
  • Future: Move all environment variables to GitHub Secrets for better security and management
  • Benefit: Eliminates the need to manage sensitive credentials in repository files

2. Remove Certificate Dependencies

  • Current: Includes certs/ folder with SSL certificates for ManTech deployment compatibility
  • Future: Remove certificate dependencies that were added specifically for ManTech
  • Benefit: Simplifies deployment and removes ManTech-specific configurations

3. Clean Up Local Development Files

  • Current: Includes various local Docker Compose files for testing/debugging
  • Future: Remove docker-compose.local.yaml and docker-compose.local-web.yaml if they’re no longer needed
  • Benefit: Reduces confusion and maintenance overhead

Key Components

The Experio deployment consists of several containerized services:
  • Server: Django backend service with integrated client assets (Dockerfile.server)
  • Reader: Job service for reading and processing files (Dockerfile.reader)
  • Ingestion: Job service for downloading and processing files from external sources (Dockerfile.ingestion)
  • PostgreSQL: Database
  • Redis: Cache and session storage
  • RabbitMQ: Message broker for job queues
  • Neo4j: Graph database (default active provider; manual scale-to-zero)
  • FalkorDB: Alternate graph database (manual scale-to-zero; off by default in cluster)
  • JupyterLab: Data analysis and exploration environment

Docker Container Architecture

Container Structure

Server Container (Dockerfile.server)

The server container is built using a multi-stage approach:
  1. Client Builder Stage: Builds frontend assets using Node.js
  2. Python Server Stage: Creates the Django backend and copies built client assets
FROM node:22.14-alpine AS client-builder
# Build client assets
COPY ./client /app/client
RUN npm run heroku-postbuild

FROM python:3.12-bullseye
# Install Python dependencies and copy built assets
COPY --from=client-builder /app/client/dist /app/client/dist

Worker Containers (Dockerfile.reader & Dockerfile.ingestion)

Both worker services use similar lightweight Python containers:
FROM python:3.11-slim
# Install specific job dependencies
COPY jobs/base/requirements.txt /app/jobs/base/
COPY jobs/reader/requirements.txt /app/jobs/reader/  # or ingestion
RUN pip install -r /app/jobs/reader/requirements.txt

Volume Strategy

The deployment uses Docker volumes for:
  1. Persistent Data Storage:
    • experio_file_storage: Uploaded files and documents
    • local_postgres_data: Database data
    • local_redis_data: Cache data
    • local_rabbitmq_data: Message queue data
    • local_neo4j_data: Neo4j graph database data
    • local_falkordb_data: FalkorDB graph database data
  2. Certificate Management:
    • ./certs:/certs:ro: SSL certificates (read-only)

Graph databases (Neo4j & FalkorDB)

Both graph backends are deployed as internal Omnistrate services with manual scale-to-zero (same pattern as JupyterLab). They are not required for platform boot — only for chat, ingestion graph writes, and admin graph tools.
ServiceDefault scaling modeDefault replicasPurpose
neo4jkeep_alive1Active graph provider (GRAPH_PROVIDER=neo4j)
falkordbkeep_down0Standby / migration target
Control via Django admin ServiceConfiguration (scale-to-zero sidecar), or Omnistrate capacity API. After changing modes, allow a few minutes for the sidecar to reconcile replica counts. See Graph backend (Neo4j & FalkorDB) for provider switching and migration runbooks.

Service Startup Order

The deployment enforces startup dependencies for platform services only. Graph databases are intentionally not in the critical path:
  1. Core infrastructure (PostgreSQL, Redis, RabbitMQ) start with health checks
  2. Database initialization (initdb) runs after PostgreSQL is healthy — needs Postgres only (seeds NEO4J_* / FALKOR_* config from secrets; does not connect to graph DBs)
  3. Application server starts after initdb completes and core infra is healthy
  4. Worker services (reader, ingestion pipeline, etc.) start after RabbitMQ and initdb; they tolerate a cold graph and reconnect when the active provider is up
  5. Graph DBs (neo4j, falkordb) scale independently via manual scale-to-zero
Chat and graph ingestion features require the active provider (GRAPH_PROVIDER) to be running. The rest of the platform (auth, admin UI, Postgres-backed settings) starts without either graph DB online.

Authentication Setup

The deployment includes automatic creation of default accounts during the initdb job:

Default Accounts

  1. Admin User
    • Email: admin@experiolabs.ai
    • Password: Value from DJANGO_SUPERUSER_PASSWORD environment variable
    • Has full superuser privileges
  2. Test User
    • Email: playwright@thinknimble.com
    • Password: Value from PLAYWRIGHT_TEST_USER_PASS environment variable
    • Used by automated tests

Omnistrate secrets

Graph credentials are injected via Omnistrate service secrets referenced in docker-compose.yaml. Add these in the Omnistrate dashboard (Secrets) before releasing a new service API version:
Secret keyUsed byPurpose
neo4jPasswordneo4j, initdbNeo4j auth and seeded NEO4J_PASSWORD
falkordbPasswordfalkordb, initdbFalkorDB auth and seeded FALKOR_PASSWORD
If falkordbPassword is missing, Omnistrate release validation fails with an error referencing the initdb component. Other existing secrets (dbPassword, rabbitmqPassword, etc.) are unchanged.

Environment Configuration

The deployment uses environment variables defined in .omnistrate.env. Key configurations include:
# Database configuration
DB_NAME='experio_db'
DB_USER='experio'
DB_HOST='postgres'

# Neo4j Graph Database (when GRAPH_PROVIDER=neo4j)
NEO4J_URI='neo4j://neo4j:7687'
NEO4J_USER='neo4j'
NEO4J_DATABASE='neo4j'

# FalkorDB (when GRAPH_PROVIDER=falkordb) — cluster internal DNS
# FALKOR_URI='redis://falkordb:6379'

# AI/ML Services
USE_AZURE_OPENAI='True'
USE_GOOGLE_GENAI='False'

# Authentication
VITE_USE_AUTH0='False'

Deployment Checklist

Before deploying to Omnistrate:
  1. Install and authenticate: Ensure omctl is installed and you’re logged in
  2. Add secrets: Ensure falkordbPassword (and other required secrets) exist in Omnistrate
  3. Update configurations: Modify docker-compose.yaml and .omnistrate.env as needed
  4. Choose deployment type: Decide whether to rebuild Docker images or skip the build
  5. Run deployment: Execute the appropriate omctl build-from-repo command
  6. Monitor deployment: Check the Omnistrate UI for deployment status and any issues
  7. Post-deploy (existing instances): Run once after upgrade:
    python manage.py migrate
    python manage.py seed_autoscaling_config
    python manage.py seed_config --force
    

Health endpoints

EndpointPurpose
GET /health/Comprehensive check. Critical: Postgres, Redis, RabbitMQ, LLM. Non-critical: active graph DB (checks.neo4j), MCP. Returns degraded (HTTP 200) when platform is up but graph is down (scale-to-zero, warm-up, migration). Returns unhealthy (HTTP 503) only when a critical dependency fails.
GET /health/simple/Process liveness only — always 200 when Django is running
Do not treat checks.neo4j.status: unhealthy alone as a platform outage when graph is intentionally scaled down.

Troubleshooting

Common Deployment Issues

  1. Authentication Failed
    • Ensure you’re logged in: omctl login
    • Check your Omnistrate account credentials
  2. Invalid Docker Compose Configuration
    • Validate locally: docker compose config
    • Check for YAML syntax errors
  3. Build Failures
    • Review build logs in the Omnistrate UI
    • Test Docker builds locally: docker build -f Dockerfile.server .
    • Missing secret references: e.g. falkordbPassword not defined in Omnistrate Secrets
  4. Service Health Check Failures
    • Check health check configurations in docker-compose.yaml
    • Verify service dependencies are correctly defined

Useful Commands

# Validate docker-compose configuration
docker compose config

# Test local deployment
./scripts/test-omni-deploy.sh  # If script is still available

# View Omnistrate deployment status
omctl get deployments

# Check deployment logs (in Omnistrate UI)
# Navigate to your service in the Omnistrate dashboard

Container Startup Failures

If containers fail to start:
  1. Check logs in Omnistrate UI: Navigate to your deployment and examine container logs
  2. Verify health checks: Ensure health check commands are appropriate for your services
  3. Check dependencies: Verify depends_on configurations match your service requirements
  4. Volume permissions: Ensure mounted volumes have appropriate permissions

Image Build Problems

If Docker image builds fail:
  1. Review build logs: Check the build process output in Omnistrate UI
  2. Test locally: Run docker build commands locally to identify issues
  3. Check base images: Ensure base images in Dockerfiles are accessible
  4. Verify file paths: Ensure all COPY commands reference existing files

Getting Help

  • Omnistrate Documentation: Check the official Omnistrate documentation
  • GitHub Issues: Review the repository’s issue tracker for known problems
  • Omnistrate Support: Contact Omnistrate support for platform-specific issues