Self-hosting Unstructured the easy way

Self-hosting Unstructured the easy way

Yulei Chen - Content-Engineerin bei sliplane.ioYulei Chen
6 min

Unstructured is an open-source document processing API that extracts structured data from PDFs, images, Word docs, HTML, and dozens of other file formats. It powers Retrieval Augmented Generation (RAG) pipelines and model fine-tuning workflows by turning messy documents into clean, chunked data. The hosted Unstructured Platform works great, but once your volume grows, self-hosting the API saves you money and keeps your data private.

Sliplane is a managed container platform that makes self-hosting painless. With one-click deployment, you can get the Unstructured API running in minutes - no server setup, no reverse proxy config, no infrastructure to maintain.

Prerequisites

Before deploying, ensure you have a Sliplane account (free trial available).

Quick start

Sliplane provides one-click deployment with presets.

SliplaneDeploy Unstructured >
  1. Click the deploy button above
  2. Select a project
  3. Select a server (If you just signed up you get a 48-hour free trial server)
  4. Click Deploy!

About the preset

The one-click deploy above uses Sliplane's Unstructured preset. The preset is configured for a production-ready setup:

  • Official Unstructured API image (unstructured-io/unstructured-api)
  • Specific version tag (0.1.5) for stability
  • API key authentication enabled by default
  • Healthcheck on /healthcheck for automatic monitoring
  • No persistent storage needed (the API is stateless and processes documents on the fly)

Next steps

Once the Unstructured API is running on Sliplane, access it using the domain Sliplane provided (e.g. unstructured-xxxx.sliplane.app).

Authentication

The preset generates a random UNSTRUCTURED_API_KEY for you. You need this key to authenticate API requests. Find it in your service's environment variables on Sliplane.

Include the key in your requests via the unstructured-api-key header:

curl -X POST https://unstructured-xxxx.sliplane.app/general/v0/general \
  -H "unstructured-api-key: YOUR_API_KEY" \
  -F "files=@mydocument.pdf"

Using the Python SDK

You can also use the Unstructured Python SDK to interact with your self-hosted API:

from unstructured_client import UnstructuredClient

client = UnstructuredClient(
    api_key_auth="YOUR_API_KEY",
    server_url="https://unstructured-xxxx.sliplane.app",
)

Environment variables

Here are the key environment variables you can customize:

VariableDescriptionDefault
UNSTRUCTURED_API_KEYAPI key for authentication(generated)
UNSTRUCTURED_PARALLEL_MODE_ENABLEDEnable parallel processingfalse
UNSTRUCTURED_PARALLEL_MODE_THREADSNumber of parallel threads3

Logging

The Unstructured API logs to STDOUT by default, which integrates well with Sliplane's built-in log viewer. For general Docker log tips, check out our post on how to use Docker logs.

Troubleshooting

If the API returns errors for certain file types, make sure your server has enough RAM. Document processing (especially OCR for images and scanned PDFs) can be memory-intensive. Consider upgrading to a larger server if you process large files regularly.

Cost comparison

You can also self-host the Unstructured API with other cloud providers. Here is a pricing comparison for the most common ones:

ProvidervCPURAMDiskMonthly CostNote
Sliplane22 GB40 GB€9 (~$10.65)Flat rate, 1 TB bandwidth, SSL included
Fly.io22 GB40 GB~$18Disk and bandwidth billed separately
Render12 GB40 GB~$35100 GB bandwidth, Disk billed separately
Railway22 GB40 GB~$67 + $20 planPro plan floor, usage-based, bandwidth billed separately
Click here to see how these numbers were calculated.

(Assuming an always-on instance running 730 hrs/month)

  • Sliplane: flat €9/month for the Base server. Unlimited services on the same server, 1 TB egress and SSL included.
  • Fly.io: shared-cpu-2x 2 GB = $11.83/mo + 40 GB volume × $0.15/GB = $6 -> ~$17.83/mo. Egress billed separately ($0.02/GB in EU).
  • Render: closest match is Standard ($25, 1 vCPU / 2 GB) plus 40 GB disk × $0.25/GB = $10 -> ~$35/mo. Stepping up to Pro (2 vCPU / 4 GB) costs $85/mo + disk.
  • Railway (Pro plan): CPU 2 × $0.00000772/s × 2,628,000 s = $40.57; RAM 2 × $0.00000386/s × 2,628,000 s = $20.29; volume 40 × $0.00000006/s × 2,628,000 s = $6.31 -> ~$67/mo compute, plus the $20/mo Pro plan floor and $0.05/GB egress.

Bandwidth costs can add up fast on usage-based providers. Use our bandwidth cost comparison tool to see what your egress would cost on each platform.

FAQ

What file types does Unstructured support?

Unstructured can process PDFs, Word documents (.docx), PowerPoint (.pptx), HTML, Markdown, plain text, images (with OCR), emails (.eml, .msg), EPUBs, and many more. Check the Unstructured documentation for the full list of supported formats.

How do I connect Unstructured to my RAG pipeline?

Point your RAG framework (like Langflow or LangChain) at your Sliplane domain as the Unstructured API endpoint. Use the API key from your environment variables for authentication. Unstructured handles the document parsing, and you can feed the output directly into your vector database like Qdrant.

How do I update Unstructured?

Change the image tag in your service settings and redeploy. Check the Unstructured API GitHub releases for the latest stable version.

Are there alternatives to Unstructured?

Yes, popular options include Apache Tika (Java-based document extraction), Docling (IBM's document parser), and LlamaParse (cloud-based, part of the LlamaIndex ecosystem). Each has different strengths depending on your use case. If you want to set up Tika alongside Open WebUI, check out our post on how to set up Apache Tika with OpenWebUI.

Can I process large documents or batches?

Yes, but keep an eye on memory usage. For large-scale batch processing, enable parallel mode by setting UNSTRUCTURED_PARALLEL_MODE_ENABLED=true and consider upgrading to a server with more RAM and CPU. The API processes documents synchronously, so large files will take longer to return a response.

Self-host Unstructured now - It's easy!

Sliplane gives you a one-click deploy for the Unstructured API, no server setup required.