
Self-hosting Unstructured the easy way
Yulei ChenUnstructured is an open-source document processing API that extracts structured data from PDFs, images, Word docs, HTML, and dozens of other file formats. It powers Retrieval Augmented Generation (RAG) pipelines and model fine-tuning workflows by turning messy documents into clean, chunked data. The hosted Unstructured Platform works great, but once your volume grows, self-hosting the API saves you money and keeps your data private.
Sliplane is a managed container platform that makes self-hosting painless. With one-click deployment, you can get the Unstructured API running in minutes - no server setup, no reverse proxy config, no infrastructure to maintain.
Prerequisites
Before deploying, ensure you have a Sliplane account (free trial available).
Quick start
Sliplane provides one-click deployment with presets.
- Click the deploy button above
- Select a project
- Select a server (If you just signed up you get a 48-hour free trial server)
- Click Deploy!
About the preset
The one-click deploy above uses Sliplane's Unstructured preset. The preset is configured for a production-ready setup:
- Official Unstructured API image (
unstructured-io/unstructured-api) - Specific version tag (
0.1.5) for stability - API key authentication enabled by default
- Healthcheck on
/healthcheckfor automatic monitoring - No persistent storage needed (the API is stateless and processes documents on the fly)
Next steps
Once the Unstructured API is running on Sliplane, access it using the domain Sliplane provided (e.g. unstructured-xxxx.sliplane.app).
Authentication
The preset generates a random UNSTRUCTURED_API_KEY for you. You need this key to authenticate API requests. Find it in your service's environment variables on Sliplane.
Include the key in your requests via the unstructured-api-key header:
curl -X POST https://unstructured-xxxx.sliplane.app/general/v0/general \
-H "unstructured-api-key: YOUR_API_KEY" \
-F "files=@mydocument.pdf"
Using the Python SDK
You can also use the Unstructured Python SDK to interact with your self-hosted API:
from unstructured_client import UnstructuredClient
client = UnstructuredClient(
api_key_auth="YOUR_API_KEY",
server_url="https://unstructured-xxxx.sliplane.app",
)
Environment variables
Here are the key environment variables you can customize:
| Variable | Description | Default |
|---|---|---|
UNSTRUCTURED_API_KEY | API key for authentication | (generated) |
UNSTRUCTURED_PARALLEL_MODE_ENABLED | Enable parallel processing | false |
UNSTRUCTURED_PARALLEL_MODE_THREADS | Number of parallel threads | 3 |
Logging
The Unstructured API logs to STDOUT by default, which integrates well with Sliplane's built-in log viewer. For general Docker log tips, check out our post on how to use Docker logs.
Troubleshooting
If the API returns errors for certain file types, make sure your server has enough RAM. Document processing (especially OCR for images and scanned PDFs) can be memory-intensive. Consider upgrading to a larger server if you process large files regularly.
Cost comparison
You can also self-host the Unstructured API with other cloud providers. Here is a pricing comparison for the most common ones:
| Provider | vCPU | RAM | Disk | Monthly Cost | Note |
|---|---|---|---|---|---|
| Sliplane | 2 | 2 GB | 40 GB | €9 (~$10.65) | Flat rate, 1 TB bandwidth, SSL included |
| Fly.io | 2 | 2 GB | 40 GB | ~$18 | Disk and bandwidth billed separately |
| Render | 1 | 2 GB | 40 GB | ~$35 | 100 GB bandwidth, Disk billed separately |
| Railway | 2 | 2 GB | 40 GB | ~$67 + $20 plan | Pro plan floor, usage-based, bandwidth billed separately |
Click here to see how these numbers were calculated.
(Assuming an always-on instance running 730 hrs/month)
- Sliplane: flat €9/month for the Base server. Unlimited services on the same server, 1 TB egress and SSL included.
- Fly.io:
shared-cpu-2x2 GB = $11.83/mo + 40 GB volume × $0.15/GB = $6 -> ~$17.83/mo. Egress billed separately ($0.02/GB in EU). - Render: closest match is Standard ($25, 1 vCPU / 2 GB) plus 40 GB disk × $0.25/GB = $10 -> ~$35/mo. Stepping up to Pro (2 vCPU / 4 GB) costs $85/mo + disk.
- Railway (Pro plan): CPU 2 × $0.00000772/s × 2,628,000 s = $40.57; RAM 2 × $0.00000386/s × 2,628,000 s = $20.29; volume 40 × $0.00000006/s × 2,628,000 s = $6.31 -> ~$67/mo compute, plus the $20/mo Pro plan floor and $0.05/GB egress.
Bandwidth costs can add up fast on usage-based providers. Use our bandwidth cost comparison tool to see what your egress would cost on each platform.
FAQ
What file types does Unstructured support?
Unstructured can process PDFs, Word documents (.docx), PowerPoint (.pptx), HTML, Markdown, plain text, images (with OCR), emails (.eml, .msg), EPUBs, and many more. Check the Unstructured documentation for the full list of supported formats.
How do I connect Unstructured to my RAG pipeline?
Point your RAG framework (like Langflow or LangChain) at your Sliplane domain as the Unstructured API endpoint. Use the API key from your environment variables for authentication. Unstructured handles the document parsing, and you can feed the output directly into your vector database like Qdrant.
How do I update Unstructured?
Change the image tag in your service settings and redeploy. Check the Unstructured API GitHub releases for the latest stable version.
Are there alternatives to Unstructured?
Yes, popular options include Apache Tika (Java-based document extraction), Docling (IBM's document parser), and LlamaParse (cloud-based, part of the LlamaIndex ecosystem). Each has different strengths depending on your use case. If you want to set up Tika alongside Open WebUI, check out our post on how to set up Apache Tika with OpenWebUI.
Can I process large documents or batches?
Yes, but keep an eye on memory usage. For large-scale batch processing, enable parallel mode by setting UNSTRUCTURED_PARALLEL_MODE_ENABLED=true and consider upgrading to a server with more RAM and CPU. The API processes documents synchronously, so large files will take longer to return a response.