
Why not RAG?
RAG is the default answer for “AI + documents.” It works well on diverse corpora — manuals, research papers, knowledge bases. It breaks on homogeneous collections like invoices, contracts, or receipts.
RAG on 500 invoices
Ask “How much did I invoice to Acme in September?” — RAG searches by similarity. All invoices look alike. It returns the highest-scoring chunks, not all the matching ones. It can’t filter by date or sum totals. You get a guess.
Sifter on 500 invoices
Sifter extracts client, date, total from every invoice once, and stores them as rows in a database. The same query becomes a real aggregation: filter by client and month, sum the totals. Exact and reproducible, every time.
“Total invoiced per client per month” is an aggregation query, not a retrieval query. RAG was built for retrieval. Sifter was built for this.
How it works
Define a Sift
Give it a name and describe what to extract in natural language:
"Extract: client name, invoice date, total amount, VAT number". Sifter infers the JSON schema automatically from the first processed document.Upload documents
Upload documents via the web UI, the REST API, the Python or TypeScript SDK, or the CLI. Organize them into folders to run multiple extractors automatically.
Key concepts
| Concept | Description |
|---|---|
| Sift | An extraction schema defined in natural language. One sift → one structured table. |
| Folder | A document container. Link it to multiple sifts — every upload triggers all of them. |
| Record | A single extracted result: one document processed by one sift. |
| Dashboard | A live board of KPIs and charts generated from extracted records. |
| Webhook | HTTP callback fired on extraction events. Wildcard patterns, retry on failure. |
Three pillars
Extract
Define what to extract in plain language. Sifter infers the schema, processes every document, and stores the results as structured records — no templates, no code.
Analyze
Extracted records are real structured data. Filter, sort, build dashboards, or ask questions in natural language. Export to CSV or pipe to your warehouse.
Build
REST API, Python SDK, TypeScript SDK, CLI, webhooks, and an MCP server for Claude, ChatGPT, Gemini, Cursor — any MCP-aware client.
Open source vs. Cloud
Sifter is MIT licensed. The OSS engine ships the complete product — chat, dashboards, webhooks, SDK, MCP stdio — and self-hosts with a singledocker compose up. Bring your own LLM key, pay for nothing.
Sifter Cloud is the managed version at sifter.run: hosted infra, remote MCP endpoint, Google Drive + mail-to-upload ingress, Stripe billing, SSO, audit log, share links. See Pricing.
Quickstart
Get up and running in 5 minutes