Omnistrate Deployment

This document outlines the process for deploying Experio to Kubernetes using the Omnistrate platform.

What is Omnistrate

Omnistrate is a platform that simplifies deploying applications to Kubernetes. It uses your docker-compose.yaml file to build Docker images, push them to GitHub packages, and deploy them as a managed service on Kubernetes clusters. This eliminates the need to manually manage Kubernetes configurations while providing enterprise-grade deployment capabilities.

Installation

Install Omnistrate CLI

Install the Omnistrate CLI tool using the installation script:

curl -fsSL https://raw.githubusercontent.com/omnistrate/cli/master/install-ctl.sh | sh

Authentication

Once installed, login to your Omnistrate account:

omctl login

You’ll be prompted to enter your Omnistrate credentials.

How It Works

We do not use omctl build-from-repo. That command builds every image serially on one machine (45+ min) and ignores the compose image: tag — it only ever pushes :latest and :sha-<digest>, which makes it impossible to pin sidecars to a real version. Instead, CI builds the images itself and hands Omnistrate a compose that references already-pushed, version-pinned images:

Build images in parallel - GitHub Actions fans out a matrix job, one runner per Experio image, with per-image BuildKit layer caching (type=gha). Wall-clock ≈ the slowest single image (the server), not the sum of all of them.
Push tags to GHCR - Each image is pushed as both :${VERSION} (the semver from VERSION, a human-facing label) and :sha-${git-sha} (unique per commit).
Generate the Omnistrate compose - scripts/generate_omni_compose.py converts docker-compose.yaml → docker-compose.omni.yaml: it strips every build: block, replaces it with the pinned image: ref, and re-pins the server sidecars (scale-to-zero, flow-runner). It pins to the :sha-${git-sha} tag, not :${VERSION} — see Image Tagging for why the version tag is unsafe to pin.
Release the plan - omctl build -f docker-compose.omni.yaml registers a new service plan version that references those immutable image tags.
Deploy - the dev instance auto-upgrades; prod requires a manual omctl upgrade.

The local-dev docker-compose.yaml (with its build: blocks) is unchanged — docker compose up still builds locally. Only CI produces the derived docker-compose.omni.yaml (gitignored, never committed).

Image naming

Images are named after the Dockerfile suffix, not the compose service name (preserving the names build-from-repo historically used, so registry paths stay stable). For example service downloader builds Dockerfile.download → ghcr.io/experio-ai/experio-dockerfile.download.

Core Deployment Commands

Standard Deployment (with Docker build)

CI normally does this. To reproduce a full release locally you would build & push each image to GHCR tagged with the commit SHA (see .github/workflows/omnistrate-build.yml for the exact matrix), then generate the compose and release:

TAG="sha-$(git rev-parse HEAD)"
# ... docker build + push each Dockerfile.* to ghcr.io/experio-ai/experio-dockerfile.* :$TAG ...
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred

Fast Deployment (skip Docker build)

When the images for this commit already exist in GHCR (e.g. only spec/config changed), skip the build entirely and just re-release the plan:

TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred

In CI this is the “skip docker build” path (dispatch input or a build(skip-docker) commit): the matrix build job is skipped and only the release job runs.

File Dependencies

The Omnistrate deployment depends on several key files:

Core Files

docker-compose.yaml - Main deployment specification defining all services, dependencies, and configurations
.omnistrate.env - Environment variables for the deployment (contains database credentials, API keys, etc.)

Dockerfiles

CI builds and pushes one image per Dockerfile below (named experio-dockerfile.<suffix>):

Dockerfile.server - Django backend server with integrated client assets
Dockerfile.migrations - DB migration / init job (compose service initdb)
Dockerfile.reader - File reading job service
Dockerfile.structured_data - Structured-data job service
Dockerfile.enrichment - Post-processing enrichment job service
Dockerfile.coordinator - Ingestion coordinator service
Dockerfile.download - File download job service (compose service downloader)
Dockerfile.parser - Document parser job service
Dockerfile.classifier - Document classifier job service
Dockerfile.ingestion_v2 - V2 ingestion job service
Dockerfile.cleanup - Cleanup job service

Supporting Files

certs/ - SSL certificates directory (currently used for ManTech deployment compatibility)

CI/CD Integration

Automated deployments are configured using GitHub Actions in .github/workflows/omnistrate-build.yml. The workflow has three jobs:

compute-meta - Decides whether to run (workflow_dispatch, a chore(release) commit on main, or a build(full) / build(skip-docker) keyword in the commit message), reads the semver from VERSION, and computes the release description from CHANGELOG.md.
build-and-push - A matrix job (one runner per image) that builds each Dockerfile.* with docker/build-push-action, caches layers via type=gha, and pushes :${VERSION} + :sha-${git-sha} to GHCR. Skipped on the “skip docker build” path.
deploy - Runs scripts/generate_omni_compose.py "sha-${git-sha}", then omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred, and finally auto-upgrades the dev instance. Prod requires a manual omctl upgrade.

Triggers: pushes to staging, main (release commits), and feature/** / fix/** / bugfix/** branches, plus manual workflow_dispatch. GHCR auth uses secrets.GH_PAT (needs write:packages); Omnistrate auth uses secrets.OMNISTRATE_API_KEY.

Image Tagging

Floating :latest tags let instances silently drift when a node pulls a fresh image (and build-from-repo only ever produced :latest / :sha). We avoid that by building the images ourselves and pinning every Experio image — main containers and server sidecars — to the immutable per-commit tag :sha-${git-sha}.

Why pin to the commit SHA, not `VERSION`

VERSION is mutable across rebuilds: rebuilding without bumping it re-pushes the same :<version> tag in GHCR, overwriting the previous image. Kubernetes uses an IfNotPresent pull policy, so a node that already cached :<version> will not re-pull — meaning a brand new plan version can keep silently running the old image (observed as “new plan, unchanged behavior / unchanged timestamp”). The :sha-${git-sha} tag is unique to every commit, so each build is a distinct reference that always pulls. VERSION remains the human-facing release label (release description, app runtime version), just not the thing the plan pins to.

How it works

CI builds each Dockerfile.* and pushes ghcr.io/experio-ai/experio-dockerfile.<suffix> tagged with both :${VERSION} (label) and :sha-${git-sha} (the pinned, immutable tag).
scripts/generate_omni_compose.py "sha-${git-sha}" produces docker-compose.omni.yaml: it removes every build: block, sets image: to the pinned :sha-… ref, and re-pins the scale-to-zero / flow-runner sidecars from :latest to the same tag. It uses a real YAML parser, so it cannot reintroduce the duplicate-image:-key error that broke an earlier envsubst-based attempt (reverted PR #985).
omctl build records those immutable tags in a new plan version. Prod stays on its plan until an explicit upgrade.

Local / manual deploys

Build & push the images for the current commit to GHCR (see the workflow matrix), then pin to that commit’s SHA tag:

TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred

Do not use :latest for Experio-built images. Prod promotion is omctl upgrade <instance-id> --version=<plan-version> only.

Helper Scripts (Potentially Deprecated)

omnistrate-deploy.sh

Located at scripts/omnistrate-deploy.sh, this script provides an interactive deployment process that:

Validates the docker-compose.yaml file
Offers options to skip Docker builds
Provides guided deployment with user prompts

Status: This script can probably be deleted if the simple omctl commands are sufficient for your workflow.

test-omni-deploy.sh

Located at scripts/test-omni-deploy.sh, this script:

Tests the deployment configuration locally before sending to Omnistrate
Provides options for fresh deployments (with volume deletion)
Offers core-only testing to speed up validation

Status: This script can probably be deleted if local testing is not needed before Omnistrate deployments.

Future Enhancements

Several improvements are planned to simplify the deployment process:

1. Remove .omnistrate.env Dependency

Current: Uses .omnistrate.env file for environment variables
Future: Move all environment variables to GitHub Secrets for better security and management
Benefit: Eliminates the need to manage sensitive credentials in repository files

2. Remove Certificate Dependencies

Current: Includes certs/ folder with SSL certificates for ManTech deployment compatibility
Future: Remove certificate dependencies that were added specifically for ManTech
Benefit: Simplifies deployment and removes ManTech-specific configurations

3. Clean Up Local Development Files

Current: Includes various local Docker Compose files for testing/debugging
Future: Remove docker-compose.local.yaml and docker-compose.local-web.yaml if they’re no longer needed
Benefit: Reduces confusion and maintenance overhead

Key Components

The Experio deployment consists of several containerized services:

Server: Django backend service with integrated client assets (Dockerfile.server)
Reader: Job service for reading and processing files (Dockerfile.reader)
Ingestion: Job service for downloading and processing files from external sources (Dockerfile.ingestion)
PostgreSQL: Database
Redis: Cache and session storage
RabbitMQ: Message broker for job queues
Neo4j: Graph database (default active provider; manual scale-to-zero)
FalkorDB: Alternate graph database (manual scale-to-zero; off by default in cluster)
JupyterLab: Data analysis and exploration environment

Docker Container Architecture

Container Structure

Server Container (Dockerfile.server)

The server container is built using a multi-stage approach:

Client Builder Stage: Builds frontend assets using Node.js
Python Server Stage: Creates the Django backend and copies built client assets

FROM node:22.14-alpine AS client-builder
# Build client assets
COPY ./client /app/client
RUN npm run heroku-postbuild

FROM python:3.12-bullseye
# Install Python dependencies and copy built assets
COPY --from=client-builder /app/client/dist /app/client/dist

Worker Containers (Dockerfile.reader & Dockerfile.ingestion)

Both worker services use similar lightweight Python containers:

FROM python:3.11-slim
# Install specific job dependencies
COPY jobs/base/requirements.txt /app/jobs/base/
COPY jobs/reader/requirements.txt /app/jobs/reader/  # or ingestion
RUN pip install -r /app/jobs/reader/requirements.txt

Volume Strategy

The deployment uses Docker volumes for:

Persistent Data Storage:
- experio_file_storage: Uploaded files and documents
- local_postgres_data: Database data
- local_redis_data: Cache data
- local_rabbitmq_data: Message queue data
- local_neo4j_data: Neo4j graph database data
- local_falkordb_data: FalkorDB graph database data
Certificate Management:
- ./certs:/certs:ro: SSL certificates (read-only)

Graph databases (Neo4j & FalkorDB)

Both graph backends are deployed as internal Omnistrate services with manual scale-to-zero (same pattern as JupyterLab). They are not required for platform boot — only for chat, ingestion graph writes, and admin graph tools.

Service	Default scaling mode	Default replicas	Purpose
`neo4j`	`keep_alive`	1	Active graph provider (`GRAPH_PROVIDER=neo4j`)
`falkordb`	`keep_down`	0	Standby / migration target

Control via Django admin ServiceConfiguration (scale-to-zero sidecar), or Omnistrate capacity API. After changing modes, allow a few minutes for the sidecar to reconcile replica counts. See Graph backend (Neo4j & FalkorDB) for provider switching and migration runbooks.

Service Startup Order

The deployment enforces startup dependencies for platform services only. Graph databases are intentionally not in the critical path:

Core infrastructure (PostgreSQL, Redis, RabbitMQ) start with health checks
Database initialization (initdb) runs after PostgreSQL is healthy — needs Postgres only (seeds NEO4J_* / FALKOR_* config from secrets; does not connect to graph DBs)
Application server starts after initdb completes and core infra is healthy
Worker services (reader, ingestion pipeline, etc.) start after RabbitMQ and initdb; they tolerate a cold graph and reconnect when the active provider is up
Graph DBs (neo4j, falkordb) scale independently via manual scale-to-zero

Chat and graph ingestion features require the active provider (GRAPH_PROVIDER) to be running. The rest of the platform (auth, admin UI, Postgres-backed settings) starts without either graph DB online.

Authentication Setup

The deployment includes automatic creation of default accounts during the initdb job:

Default Accounts

Admin User
- Email: admin@experiolabs.ai
- Password: Value from DJANGO_SUPERUSER_PASSWORD environment variable
- Has full superuser privileges
Test User
- Email: playwright@thinknimble.com
- Password: Value from PLAYWRIGHT_TEST_USER_PASS environment variable
- Used by automated tests

Omnistrate secrets

Graph credentials are injected via Omnistrate service secrets referenced in docker-compose.yaml. Add these in the Omnistrate dashboard (Secrets) before releasing a new service API version:

Secret key	Used by	Purpose
`neo4jPassword`	`neo4j`, `initdb`	Neo4j auth and seeded `NEO4J_PASSWORD`
`falkordbPassword`	`falkordb`, `initdb`	FalkorDB auth and seeded `FALKOR_PASSWORD`

If falkordbPassword is missing, Omnistrate release validation fails with an error referencing the initdb component. Other existing secrets (dbPassword, rabbitmqPassword, etc.) are unchanged.

Environment Configuration

The deployment uses environment variables defined in .omnistrate.env. Key configurations include:

# Database configuration
DB_NAME='experio_db'
DB_USER='experio'
DB_HOST='postgres'

# Neo4j Graph Database (when GRAPH_PROVIDER=neo4j)
NEO4J_URI='neo4j://neo4j:7687'
NEO4J_USER='neo4j'
NEO4J_DATABASE='neo4j'

# FalkorDB (when GRAPH_PROVIDER=falkordb) — cluster internal DNS
# FALKOR_URI='redis://falkordb:6379'

# AI/ML Services
USE_AZURE_OPENAI='True'
USE_GOOGLE_GENAI='False'

# Authentication
VITE_USE_AUTH0='False'

Deployment Checklist

Before deploying to Omnistrate:

Install and authenticate: Ensure omctl is installed and you’re logged in
Add secrets: Ensure falkordbPassword (and other required secrets) exist in Omnistrate
Update configurations: Modify docker-compose.yaml and .omnistrate.env as needed
Choose deployment type: Decide whether to rebuild Docker images or skip the build
Run deployment: Execute the appropriate omctl build-from-repo command
Monitor deployment: Check the Omnistrate UI for deployment status and any issues

Post-deploy (existing instances): Run once after upgrade:

python manage.py migrate
python manage.py seed_autoscaling_config
python manage.py seed_config --force

Health endpoints

Endpoint	Purpose
`GET /health/`	Comprehensive check. Critical: Postgres, Redis, RabbitMQ, LLM. Non-critical: active graph DB (`checks.neo4j`), MCP. Returns `degraded` (HTTP 200) when platform is up but graph is down (scale-to-zero, warm-up, migration). Returns `unhealthy` (HTTP 503) only when a critical dependency fails.
`GET /health/simple/`	Process liveness only — always 200 when Django is running

Do not treat checks.neo4j.status: unhealthy alone as a platform outage when graph is intentionally scaled down.

Troubleshooting

Common Deployment Issues

Authentication Failed
- Ensure you’re logged in: omctl login
- Check your Omnistrate account credentials
Invalid Docker Compose Configuration
- Validate locally: docker compose config
- Check for YAML syntax errors
Build Failures
- Review build logs in the Omnistrate UI
- Test Docker builds locally: docker build -f Dockerfile.server .
- Missing secret references: e.g. falkordbPassword not defined in Omnistrate Secrets
Service Health Check Failures
- Check health check configurations in docker-compose.yaml
- Verify service dependencies are correctly defined

Useful Commands

# Validate docker-compose configuration
docker compose config

# Test local deployment
./scripts/test-omni-deploy.sh  # If script is still available

# View Omnistrate deployment status
omctl get deployments

# Check deployment logs (in Omnistrate UI)
# Navigate to your service in the Omnistrate dashboard

Container Startup Failures

If containers fail to start:

Check logs in Omnistrate UI: Navigate to your deployment and examine container logs
Verify health checks: Ensure health check commands are appropriate for your services
Check dependencies: Verify depends_on configurations match your service requirements
Volume permissions: Ensure mounted volumes have appropriate permissions

Image Build Problems

If Docker image builds fail:

Review build logs: Check the build process output in Omnistrate UI
Test locally: Run docker build commands locally to identify issues
Check base images: Ensure base images in Dockerfiles are accessible
Verify file paths: Ensure all COPY commands reference existing files

Getting Help

Omnistrate Documentation: Check the official Omnistrate documentation
GitHub Issues: Review the repository’s issue tracker for known problems
Omnistrate Support: Contact Omnistrate support for platform-specific issues

​Omnistrate Deployment

​What is Omnistrate

​Installation

​Install Omnistrate CLI

​Authentication

​How It Works

​Image naming

​Core Deployment Commands

​Standard Deployment (with Docker build)

​Fast Deployment (skip Docker build)

​File Dependencies

​Core Files

​Dockerfiles

​Supporting Files

​CI/CD Integration

​Image Tagging

​Why pin to the commit SHA, not VERSION

​How it works

​Local / manual deploys

​Helper Scripts (Potentially Deprecated)

​omnistrate-deploy.sh

​test-omni-deploy.sh

​Future Enhancements

​1. Remove .omnistrate.env Dependency

​2. Remove Certificate Dependencies

​3. Clean Up Local Development Files

​Key Components

​Docker Container Architecture

​Container Structure

​Server Container (Dockerfile.server)

​Worker Containers (Dockerfile.reader & Dockerfile.ingestion)

​Volume Strategy

​Graph databases (Neo4j & FalkorDB)

​Service Startup Order

​Authentication Setup

​Default Accounts

​Omnistrate secrets

​Environment Configuration

​Deployment Checklist

​Health endpoints

​Troubleshooting

​Common Deployment Issues

​Useful Commands

​Docker-Related Issues

​Container Startup Failures

​Image Build Problems

​Getting Help