Extraction Policy - Experio Documentation

Overview

Extraction policy controls how deeply ingestion runs LLM entity extraction for each content type. Use it to reduce cost on low-value files (for example large Excel exports) while keeping full extraction on types that need rich graph data. Navigate to Admin > Data Sources > Content Types, open a type, and scroll to Ingestion extraction on the Basic Information tab.

Extraction modes

Mode	Behavior
`full`	Normal LLM entity extraction (default)
`metadata_and_snippet`	Document shell entity plus a short text preview; no chunked LLM extraction
`metadata_only`	Shell entity from filename, path, and classification only; no content LLM calls

For Excel (.xlsx, .xlsm, .xls), resolution order is: filter override → content-type Excel mode → default mode → full Parsed spreadsheet text is still stored on the Document node for chat retrieval even when extraction is skipped.

Content-type settings

Configure in the admin UI or in the content type’s JSON metadata under extraction_policy:

{
  "extraction_policy": {
    "default": {
      "mode": "full",
      "model_tier": "large",
      "validation_pass": true
    },
    "excel": {
      "mode": "metadata_only",
      "validation_pass": false,
      "snippet_chars": 2000
    }
  }
}

UI fields

Field	Purpose
Default mode	Extraction depth for most file types
Primary model tier	`large`, `medium`, or `small` for primary extraction
Run validation pass (default)	Secondary LLM pass to fill gaps; skipped when policy disables it or heuristics apply
Excel mode	Override default for spreadsheets, or Same as default
Run validation pass (Excel)	Shown when Excel mode differs from default

Model tiers

Tier	System setting	Used for
`large` (default)	`INGESTION_LARGE_MODEL_CONFIG`	Primary extraction
`medium`	`INGESTION_MEDIUM_MODEL_CONFIG`	Primary extraction when set on the content type
`small`	`INGESTION_SMALL_MODEL_CONFIG`	Primary extraction when set on the content type

Secondary steps (validation, JSON repair, relationship backfill, entity disambiguation) use INGESTION_SMALL_MODEL_CONFIG, falling back to the large model if unset. Create Ingestion - Medium model configurations under Model Configurations and assign one in System Settings before using the medium tier.

Excel sheet handling

Spreadsheets are parsed by Kreuzberg. Each sheet becomes a markdown block headed by ## SheetName. Ingestion splits on those headers and applies caps per sheet:

System setting	Default	Purpose
`MAX_EXCEL_SHEET_CHARS`	`50000`	Skip LLM extraction on sheets above this size
`MAX_EXCEL_INGESTION_CHUNKS_PER_SHEET`	`25`	Cap LLM chunks per sheet in full mode
`INGESTION_COST_GUARD_CHUNK_THRESHOLD`	`120`	Estimated chunk count above which full mode falls back to `metadata_only`

These settings are seeded in System Settings. The cost guard threshold is also editable from the dashboard.

Filter-level Excel controls

When configuring data source filters, you can control spreadsheet ingestion per filter:

Field	Default	Purpose
`parse_excel_files`	`true`	Opt in to Excel ingestion for matching files
`excel_extraction_mode`	inherit	Override content-type Excel mode (`full`, `metadata_and_snippet`, `metadata_only`)
`excel_max_sheet_chars`	inherit	Per-filter per-sheet character cap

When Ingest Excel files is unchecked and a file matches an enabled filter, Excel ingestion is skipped with reason excel_ingestion_disabled_by_filter. Files with no matched filters still ingest Excel (legacy behavior).

Use unchecked Ingest Excel files on export-only filters when you want those spreadsheets excluded from the graph.

Example configurations

Content type	Excel mode	Typical use
Requirements / exports	`metadata_only`	Client input spreadsheets, inventory dumps
Deliverable	`metadata_and_snippet`	Artifacts where a short preview is enough
Structured workbook type	`full`	Sheets where row-level entities matter

Pair export-style content types with filters that leave Ingest Excel files unchecked unless you explicitly want those files in the graph.

​Overview

​Extraction modes

​Content-type settings

​UI fields

​Model tiers

​Excel sheet handling

​Filter-level Excel controls

​Example configurations

Overview

Extraction modes

Content-type settings

UI fields

Model tiers

Excel sheet handling

Filter-level Excel controls

Example configurations