Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.experio.cloud/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Data sources define which folders and files Experio should scan and process from your connected cloud storage providers. Each data source is linked to a connector and specifies folder paths, scanning behavior, and filtering rules. Navigate to Admin > Data Sources > Data Sources.

Creating a Data Source

Click Add New Data Source to start a multi-step configuration wizard:
1

Choose Source Type

Select the type of data source:
  • Box — Scan folders from a Box account
  • Google Drive — Scan folders from Google Drive
  • SharePoint — Scan folders from a SharePoint site
  • File Upload — Upload files directly to Experio
2

Validate Configuration

Enter the connection details and validate that Experio can access the specified location. The system verifies credentials and folder access.
3

Configure Filters

Set up folder hierarchy and filtering rules:
  • Folder paths — Specify which folders to scan
  • Recursive scanning — Include subfolders
  • Filter expressions — Include or exclude files based on patterns
4

Setup Source

Configure ingestion settings for the data source:
  • Days to sync — How far back to scan for files
  • Use OCR — Enable optical character recognition for scanned documents
  • Classification max pages — Limit pages sent to the classifier
  • Ingestion type — Choose Full ingestion (default) for the complete pipeline, or Parse only to stop after parsing (useful when a downstream system handles classification and embedding)
For API data sources, these options appear in the source configuration step instead.
5

Test Filters

Preview which files match your filter configuration before saving. This ensures only the intended files will be processed.

Data Source Properties

PropertyDescription
NameDisplay name for identifying the data source
ConnectorThe authorized connection to use
Folder PathRoot folder to scan
RecursiveWhether to scan subfolders
Filter ExpressionPattern to include/exclude files
Ingestion TypePipeline mode: Full ingestion (default) runs the complete pipeline (download → parse → classify → graph → embed). Parse only stops after parsing — files are downloaded and parsed, but not classified, added to the knowledge graph, or embedded. Parsed artifacts are stored in Minio for downstream consumption.
StatusActive, paused, or error

Managing Data Sources

Editing

Click on any data source to open its configuration. Modify settings and save to apply changes. Changes take effect on the next scan cycle.

Monitoring

Each data source shows:
  • Last scan time — When the source was last scanned
  • Files found — Number of files discovered
  • Files processed — Number of files successfully ingested
  • Errors — Any files that failed processing

OAuth Callbacks

For Box and SharePoint data sources, OAuth callback handling is built in. If a token expires, you’ll be prompted to re-authorize through the connector.

Parse-Only Mode

When a data source has Ingestion Type set to Parse only, the ingestion pipeline stops after downloading and parsing files. Specifically:
  • Files are downloaded from the cloud provider and parsed using the standard parser
  • Parsed artifacts are stored in Minio (under parsed/{file_id}/...) with the same retention policy as full ingestion
  • No classification, graph ingestion, or embedding occurs
  • Files reach a terminal status of parsed_only instead of ingested
This mode is useful when an external system (such as a partner pipeline) needs to consume the parsed output and handle classification and embedding independently.
Ingestion Type can only be changed when the data source has no files currently processing. If you try to switch modes while a scan is in flight, the update is rejected with a validation error. Wait for the current scan to complete (or stop it) before changing the mode. The new mode takes effect on the next scan.

File Upload

The File Upload data source type allows direct file uploads:
  • Drag and drop files onto the upload area
  • Track upload progress with visual indicators
  • Files are queued for processing automatically after upload