> ## Documentation Index
> Fetch the complete documentation index at: https://docs.experio.cloud/llms.txt
> Use this file to discover all available pages before exploring further.

# Omnistrate

# Omnistrate Deployment

This document outlines the process for deploying Experio to Kubernetes using the Omnistrate platform.

## What is Omnistrate

Omnistrate is a platform that simplifies deploying applications to Kubernetes. It uses your `docker-compose.yaml` file to build Docker images, push them to GitHub packages, and deploy them as a managed service on Kubernetes clusters. This eliminates the need to manually manage Kubernetes configurations while providing enterprise-grade deployment capabilities.

## Installation

### Install Omnistrate CLI

Install the Omnistrate CLI tool using the installation script:

```bash theme={null}
curl -fsSL https://raw.githubusercontent.com/omnistrate/cli/master/install-ctl.sh | sh
```

### Authentication

Once installed, login to your Omnistrate account:

```bash theme={null}
omctl login
```

You'll be prompted to enter your Omnistrate credentials.

## How It Works

We **do not** use `omctl build-from-repo`. That command builds every image serially on one
machine (45+ min) and ignores the compose `image:` tag — it only ever pushes `:latest` and
`:sha-<digest>`, which makes it impossible to pin sidecars to a real version. Instead, CI
builds the images itself and hands Omnistrate a compose that references already-pushed,
version-pinned images:

1. **Build images in parallel** - GitHub Actions fans out a matrix job, one runner per
   Experio image, with per-image BuildKit layer caching (`type=gha`). Wall-clock ≈ the
   slowest single image (the server), not the sum of all of them.
2. **Push tags to GHCR** - Each image is pushed as both `:${VERSION}` (the semver from
   `VERSION`, a human-facing label) and `:sha-${git-sha}` (unique per commit).
3. **Generate the Omnistrate compose** - `scripts/generate_omni_compose.py` converts
   `docker-compose.yaml` → `docker-compose.omni.yaml`: it strips every `build:` block,
   replaces it with the pinned `image:` ref, and re-pins the server sidecars
   (`scale-to-zero`, `flow-runner`). It pins to the **`:sha-${git-sha}`** tag, not
   `:${VERSION}` — see [Image Tagging](#image-tagging) for why the version tag is unsafe to pin.
4. **Release the plan** - `omctl build -f docker-compose.omni.yaml` registers a new service
   plan version that references those immutable image tags.
5. **Deploy** - the dev instance auto-upgrades; prod requires a manual `omctl upgrade`.

The local-dev `docker-compose.yaml` (with its `build:` blocks) is **unchanged** — `docker
compose up` still builds locally. Only CI produces the derived `docker-compose.omni.yaml`
(gitignored, never committed).

### Image naming

Images are named after the **Dockerfile suffix**, not the compose service name (preserving
the names `build-from-repo` historically used, so registry paths stay stable). For example
service `downloader` builds `Dockerfile.download` → `ghcr.io/experio-ai/experio-dockerfile.download`.

## Core Deployment Commands

### Standard Deployment (with Docker build)

CI normally does this. To reproduce a full release locally you would build & push each image
to GHCR tagged with the commit SHA (see `.github/workflows/omnistrate-build.yml` for the
exact matrix), then generate the compose and release:

```bash theme={null}
TAG="sha-$(git rev-parse HEAD)"
# ... docker build + push each Dockerfile.* to ghcr.io/experio-ai/experio-dockerfile.* :$TAG ...
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred
```

### Fast Deployment (skip Docker build)

When the images for this commit already exist in GHCR (e.g. only spec/config changed), skip
the build entirely and just re-release the plan:

```bash theme={null}
TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred
```

In CI this is the **"skip docker build"** path (dispatch input or a `build(skip-docker)`
commit): the matrix build job is skipped and only the release job runs.

## File Dependencies

The Omnistrate deployment depends on several key files:

### Core Files

* **`docker-compose.yaml`** - Main deployment specification defining all services, dependencies, and configurations
* **`.omnistrate.env`** - Environment variables for the deployment (contains database credentials, API keys, etc.)

### Dockerfiles

CI builds and pushes one image per Dockerfile below (named `experio-dockerfile.<suffix>`):

* **`Dockerfile.server`** - Django backend server with integrated client assets
* **`Dockerfile.migrations`** - DB migration / init job (compose service `initdb`)
* **`Dockerfile.reader`** - File reading job service
* **`Dockerfile.structured_data`** - Structured-data job service
* **`Dockerfile.enrichment`** - Post-processing enrichment job service
* **`Dockerfile.coordinator`** - Ingestion coordinator service
* **`Dockerfile.download`** - File download job service (compose service `downloader`)
* **`Dockerfile.parser`** - Document parser job service
* **`Dockerfile.classifier`** - Document classifier job service
* **`Dockerfile.ingestion_v2`** - V2 ingestion job service
* **`Dockerfile.cleanup`** - Cleanup job service

### Supporting Files

* **`certs/`** - SSL certificates directory (currently used for ManTech deployment compatibility)

## CI/CD Integration

Automated deployments are configured using GitHub Actions in `.github/workflows/omnistrate-build.yml`. The workflow has three jobs:

1. **`compute-meta`** - Decides whether to run (`workflow_dispatch`, a `chore(release)` commit
   on `main`, or a `build(full)` / `build(skip-docker)` keyword in the commit message),
   reads the semver from `VERSION`, and computes the release description from `CHANGELOG.md`.
2. **`build-and-push`** - A matrix job (one runner per image) that builds each `Dockerfile.*`
   with `docker/build-push-action`, caches layers via `type=gha`, and pushes `:${VERSION}` +
   `:sha-${git-sha}` to GHCR. Skipped on the "skip docker build" path.
3. **`deploy`** - Runs `scripts/generate_omni_compose.py "sha-${git-sha}"`, then
   `omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred`,
   and finally auto-upgrades the dev instance. Prod requires a manual `omctl upgrade`.

Triggers: pushes to `staging`, `main` (release commits), and `feature/**` / `fix/**` /
`bugfix/**` branches, plus manual `workflow_dispatch`. GHCR auth uses `secrets.GH_PAT`
(needs `write:packages`); Omnistrate auth uses `secrets.OMNISTRATE_API_KEY`.

## Image Tagging

Floating `:latest` tags let instances silently drift when a node pulls a fresh image (and
`build-from-repo` only ever produced `:latest` / `:sha`). We avoid that by building the images
ourselves and pinning every Experio image — main containers **and** server sidecars — to the
**immutable per-commit tag `:sha-${git-sha}`**.

### Why pin to the commit SHA, not `VERSION`

`VERSION` is **mutable across rebuilds**: rebuilding without bumping it re-pushes the same
`:<version>` tag in GHCR, overwriting the previous image. Kubernetes uses an `IfNotPresent`
pull policy, so a node that already cached `:<version>` will **not** re-pull — meaning a brand
new plan version can keep silently running the *old* image (observed as "new plan, unchanged
behavior / unchanged timestamp"). The `:sha-${git-sha}` tag is unique to every commit, so each
build is a distinct reference that always pulls. `VERSION` remains the human-facing release
label (release description, app runtime version), just not the thing the plan pins to.

### How it works

1. CI builds each `Dockerfile.*` and pushes `ghcr.io/experio-ai/experio-dockerfile.<suffix>`
   tagged with both `:${VERSION}` (label) and `:sha-${git-sha}` (the pinned, immutable tag).
2. `scripts/generate_omni_compose.py "sha-${git-sha}"` produces `docker-compose.omni.yaml`: it
   removes every `build:` block, sets `image:` to the pinned `:sha-…` ref, and re-pins the
   `scale-to-zero` / `flow-runner` sidecars from `:latest` to the same tag. It uses a real YAML
   parser, so it cannot reintroduce the duplicate-`image:`-key error that broke an earlier
   `envsubst`-based attempt (reverted PR #985).
3. `omctl build` records those immutable tags in a new plan version. Prod stays on its plan
   until an explicit upgrade.

### Local / manual deploys

Build & push the images for the current commit to GHCR (see the workflow matrix), then pin to
that commit's SHA tag:

```bash theme={null}
TAG="sha-$(git rev-parse HEAD)"
python scripts/generate_omni_compose.py "$TAG"
omctl build -f docker-compose.omni.yaml --product-name Experio-1.1 --release-as-preferred
```

Do not use `:latest` for Experio-built images. Prod promotion is `omctl upgrade <instance-id> --version=<plan-version>` only.

## Helper Scripts (Potentially Deprecated)

### omnistrate-deploy.sh

Located at `scripts/omnistrate-deploy.sh`, this script provides an interactive deployment process that:

* Validates the docker-compose.yaml file
* Offers options to skip Docker builds
* Provides guided deployment with user prompts

**Status**: This script can probably be deleted if the simple `omctl` commands are sufficient for your workflow.

### test-omni-deploy.sh

Located at `scripts/test-omni-deploy.sh`, this script:

* Tests the deployment configuration locally before sending to Omnistrate
* Provides options for fresh deployments (with volume deletion)
* Offers core-only testing to speed up validation

**Status**: This script can probably be deleted if local testing is not needed before Omnistrate deployments.

## Future Enhancements

Several improvements are planned to simplify the deployment process:

### 1. Remove .omnistrate.env Dependency

* **Current**: Uses `.omnistrate.env` file for environment variables
* **Future**: Move all environment variables to GitHub Secrets for better security and management
* **Benefit**: Eliminates the need to manage sensitive credentials in repository files

### 2. Remove Certificate Dependencies

* **Current**: Includes `certs/` folder with SSL certificates for ManTech deployment compatibility
* **Future**: Remove certificate dependencies that were added specifically for ManTech
* **Benefit**: Simplifies deployment and removes ManTech-specific configurations

### 3. Clean Up Local Development Files

* **Current**: Includes various local Docker Compose files for testing/debugging
* **Future**: Remove `docker-compose.local.yaml` and `docker-compose.local-web.yaml` if they're no longer needed
* **Benefit**: Reduces confusion and maintenance overhead

## Key Components

The Experio deployment consists of several containerized services:

* **Server**: Django backend service with integrated client assets (`Dockerfile.server`)
* **Reader**: Job service for reading and processing files (`Dockerfile.reader`)
* **Ingestion**: Job service for downloading and processing files from external sources (`Dockerfile.ingestion`)
* **PostgreSQL**: Database
* **Redis**: Cache and session storage
* **RabbitMQ**: Message broker for job queues
* **Neo4j**: Graph database (default active provider; manual scale-to-zero)
* **FalkorDB**: Alternate graph database (manual scale-to-zero; off by default in cluster)
* **JupyterLab**: Data analysis and exploration environment

## Docker Container Architecture

### Container Structure

#### Server Container (Dockerfile.server)

The server container is built using a multi-stage approach:

1. **Client Builder Stage**: Builds frontend assets using Node.js
2. **Python Server Stage**: Creates the Django backend and copies built client assets

```dockerfile theme={null}
FROM node:22.14-alpine AS client-builder
# Build client assets
COPY ./client /app/client
RUN npm run heroku-postbuild

FROM python:3.12-bullseye
# Install Python dependencies and copy built assets
COPY --from=client-builder /app/client/dist /app/client/dist
```

#### Worker Containers (Dockerfile.reader & Dockerfile.ingestion)

Both worker services use similar lightweight Python containers:

```dockerfile theme={null}
FROM python:3.11-slim
# Install specific job dependencies
COPY jobs/base/requirements.txt /app/jobs/base/
COPY jobs/reader/requirements.txt /app/jobs/reader/  # or ingestion
RUN pip install -r /app/jobs/reader/requirements.txt
```

### Volume Strategy

The deployment uses Docker volumes for:

1. **Persistent Data Storage**:

   * `experio_file_storage`: Uploaded files and documents
   * `local_postgres_data`: Database data
   * `local_redis_data`: Cache data
   * `local_rabbitmq_data`: Message queue data
   * `local_neo4j_data`: Neo4j graph database data
   * `local_falkordb_data`: FalkorDB graph database data

2. **Certificate Management**:
   * `./certs:/certs:ro`: SSL certificates (read-only)

### Graph databases (Neo4j & FalkorDB)

Both graph backends are deployed as **internal Omnistrate services** with **manual
scale-to-zero** (same pattern as JupyterLab). They are **not** required for platform
boot — only for chat, ingestion graph writes, and admin graph tools.

| Service    | Default scaling mode | Default replicas | Purpose                                        |
| ---------- | -------------------- | ---------------- | ---------------------------------------------- |
| `neo4j`    | `keep_alive`         | 1                | Active graph provider (`GRAPH_PROVIDER=neo4j`) |
| `falkordb` | `keep_down`          | 0                | Standby / migration target                     |

Control via Django admin **ServiceConfiguration** (scale-to-zero sidecar), or
Omnistrate capacity API. After changing modes, allow a few minutes for the sidecar
to reconcile replica counts.

See [Graph backend (Neo4j & FalkorDB)](/admin-guide/graph-backend) for provider
switching and migration runbooks.

### Service Startup Order

The deployment enforces startup dependencies for **platform** services only.
Graph databases are intentionally **not** in the critical path:

1. **Core infrastructure** (PostgreSQL, Redis, RabbitMQ) start with health checks
2. **Database initialization** (`initdb`) runs after PostgreSQL is healthy — needs
   **Postgres only** (seeds `NEO4J_*` / `FALKOR_*` config from secrets; does not
   connect to graph DBs)
3. **Application server** starts after `initdb` completes and core infra is healthy
4. **Worker services** (reader, ingestion pipeline, etc.) start after RabbitMQ and
   `initdb`; they tolerate a cold graph and reconnect when the active provider is up
5. **Graph DBs** (`neo4j`, `falkordb`) scale independently via manual scale-to-zero

Chat and graph ingestion features require the **active** provider (`GRAPH_PROVIDER`)
to be running. The rest of the platform (auth, admin UI, Postgres-backed settings)
starts without either graph DB online.

## Authentication Setup

The deployment includes automatic creation of default accounts during the `initdb` job:

### Default Accounts

1. **Admin User**

   * Email: `admin@experiolabs.ai`
   * Password: Value from `DJANGO_SUPERUSER_PASSWORD` environment variable
   * Has full superuser privileges

2. **Test User**
   * Email: `playwright@thinknimble.com`
   * Password: Value from `PLAYWRIGHT_TEST_USER_PASS` environment variable
   * Used by automated tests

## Omnistrate secrets

Graph credentials are injected via Omnistrate service secrets referenced in
`docker-compose.yaml`. Add these in the Omnistrate dashboard (**Secrets**) before
releasing a new service API version:

| Secret key         | Used by              | Purpose                                    |
| ------------------ | -------------------- | ------------------------------------------ |
| `neo4jPassword`    | `neo4j`, `initdb`    | Neo4j auth and seeded `NEO4J_PASSWORD`     |
| `falkordbPassword` | `falkordb`, `initdb` | FalkorDB auth and seeded `FALKOR_PASSWORD` |

If `falkordbPassword` is missing, Omnistrate release validation fails with an error
referencing the `initdb` component.

Other existing secrets (`dbPassword`, `rabbitmqPassword`, etc.) are unchanged.

## Environment Configuration

The deployment uses environment variables defined in `.omnistrate.env`. Key configurations include:

```bash theme={null}
# Database configuration
DB_NAME='experio_db'
DB_USER='experio'
DB_HOST='postgres'

# Neo4j Graph Database (when GRAPH_PROVIDER=neo4j)
NEO4J_URI='neo4j://neo4j:7687'
NEO4J_USER='neo4j'
NEO4J_DATABASE='neo4j'

# FalkorDB (when GRAPH_PROVIDER=falkordb) — cluster internal DNS
# FALKOR_URI='redis://falkordb:6379'

# AI/ML Services
USE_AZURE_OPENAI='True'
USE_GOOGLE_GENAI='False'

# Authentication
VITE_USE_AUTH0='False'
```

## Deployment Checklist

Before deploying to Omnistrate:

1. **Install and authenticate**: Ensure `omctl` is installed and you're logged in
2. **Add secrets**: Ensure `falkordbPassword` (and other required secrets) exist in Omnistrate
3. **Update configurations**: Modify `docker-compose.yaml` and `.omnistrate.env` as needed
4. **Choose deployment type**: Decide whether to rebuild Docker images or skip the build
5. **Run deployment**: Execute the appropriate `omctl build-from-repo` command
6. **Monitor deployment**: Check the Omnistrate UI for deployment status and any issues
7. **Post-deploy (existing instances)**: Run once after upgrade:
   ```bash theme={null}
   python manage.py migrate
   python manage.py seed_autoscaling_config
   python manage.py seed_config --force
   ```

## Health endpoints

| Endpoint              | Purpose                                                                                                                                                                                                                                                                                                    |
| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GET /health/`        | Comprehensive check. **Critical**: Postgres, Redis, RabbitMQ, LLM. **Non-critical**: active graph DB (`checks.neo4j`), MCP. Returns `degraded` (HTTP 200) when platform is up but graph is down (scale-to-zero, warm-up, migration). Returns `unhealthy` (HTTP 503) only when a critical dependency fails. |
| `GET /health/simple/` | Process liveness only — always 200 when Django is running                                                                                                                                                                                                                                                  |

Do not treat `checks.neo4j.status: unhealthy` alone as a platform outage when graph
is intentionally scaled down.

## Troubleshooting

### Common Deployment Issues

1. **Authentication Failed**

   * Ensure you're logged in: `omctl login`
   * Check your Omnistrate account credentials

2. **Invalid Docker Compose Configuration**

   * Validate locally: `docker compose config`
   * Check for YAML syntax errors

3. **Build Failures**

   * Review build logs in the Omnistrate UI
   * Test Docker builds locally: `docker build -f Dockerfile.server .`
   * **Missing secret references**: e.g. `falkordbPassword` not defined in Omnistrate Secrets

4. **Service Health Check Failures**
   * Check health check configurations in `docker-compose.yaml`
   * Verify service dependencies are correctly defined

### Useful Commands

```bash theme={null}
# Validate docker-compose configuration
docker compose config

# Test local deployment
./scripts/test-omni-deploy.sh  # If script is still available

# View Omnistrate deployment status
omctl get deployments

# Check deployment logs (in Omnistrate UI)
# Navigate to your service in the Omnistrate dashboard
```

### Docker-Related Issues

#### Container Startup Failures

If containers fail to start:

1. **Check logs in Omnistrate UI**: Navigate to your deployment and examine container logs
2. **Verify health checks**: Ensure health check commands are appropriate for your services
3. **Check dependencies**: Verify `depends_on` configurations match your service requirements
4. **Volume permissions**: Ensure mounted volumes have appropriate permissions

#### Image Build Problems

If Docker image builds fail:

1. **Review build logs**: Check the build process output in Omnistrate UI
2. **Test locally**: Run `docker build` commands locally to identify issues
3. **Check base images**: Ensure base images in Dockerfiles are accessible
4. **Verify file paths**: Ensure all COPY commands reference existing files

### Getting Help

* **Omnistrate Documentation**: Check the official Omnistrate documentation
* **GitHub Issues**: Review the repository's issue tracker for known problems
* **Omnistrate Support**: Contact Omnistrate support for platform-specific issues
