A background worker runs on Fly.io. It handles AI embeddings for a cloud storage platform—text extraction from PDFs via Unstructured.io, chunking documents, generating vectors with OpenAI's embedding API. Event-driven architecture: Inngest Cloud dispatches jobs, the worker wakes up, processes, suspends. Pay-per-use.
The January bill showed $18 for 16 days. For a machine that should run maybe 10 minutes per day, that's wrong.
Debugging the cost
Cost Explorer showed $17.91 for the worker. Checking the Fly.io dashboard revealed two machines:
- Machine 1: suspended
- Machine 2: started (running)
┌────────────────────────────────────────┐
│ Expected behavior: │
│ • Upload triggers Inngest event │
│ • Worker wakes, processes, suspends │
│ • Cost: ~$1-3/month │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Actual behavior: │
│ • Machine 2 running continuously │
│ • Machine 1 suspended (unused) │
│ • Cost: ~$18/month (39% uptime) │
└────────────────────────────────────────┘
Reading fly.toml:
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = "suspend"
auto_start_machines = true
min_machines_running = 1 # ← The problemmin_machines_running = 1 means Fly always keeps one machine running. It defeats the entire purpose of auto-suspend. The worker was configured to never fully sleep.
With a performance-4x VM (8GB RAM) at ~$0.12/hour:
- 24/7 for 16 days = 384 hours
- 384 hours × $0.12 = $46.08 potential cost
- Actual charge: $18 (the machine scaled down during low activity, but still ran constantly)
The fix
Changed one line:
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = "suspend"
auto_start_machines = true
- min_machines_running = 1
+ min_machines_running = 0Deployed the update, destroyed the extra machine:
flyctl deploy --config apps/worker/fly.toml
flyctl machine destroy e827771c03ee28 --app disposal-worker --forceVerified the result:
ID STATE CHECKS
68306d3a560d28 stopped 1 warning
One machine. Stopped. The health check warning is expected—it can't pass when the machine is off. That's fine. Inngest will wake it when needed.
Why it happened
When we migrated from Railway, we created the fly.toml based on Fly.io's defaults. The docs recommend min_machines_running = 1 for HTTP services to avoid cold start latency. That makes sense for a user-facing API that needs instant response.
But this isn't user-facing. It's a background worker triggered by Inngest Cloud. Cold start latency doesn't matter—Inngest handles retries, the job runs async, and a 5-second wake time is irrelevant when processing takes 30+ seconds anyway.
We copied the config without questioning whether the defaults fit the use case.
What changed
The worker now:
- Suspends after 60 seconds of inactivity
- Wakes when Inngest dispatches an event
- Charges only for active processing time
Expected monthly cost: $1-3 instead of $18.
The machine is performance-4x with 8GB RAM. That's oversized for this workload—PDF processing is I/O-bound (waiting on Unstructured.io for text extraction, OpenAI for embeddings). We could drop to shared-cpu-2x (2GB) and cut per-minute costs by 75%. But with auto-suspend working, the total cost is low enough that optimization can wait until we see actual memory usage patterns.
Lessons
Read the defaults critically. Copy-paste from docs works until it doesn't. min_machines_running = 1 is a reasonable default for most HTTP services. Not for background workers.
Billing reveals misconfigurations. The worker "worked" in that it processed jobs successfully. The only signal something was wrong was the cost. If we hadn't checked the bill, we'd still be burning $18/month.
Infrastructure assumptions don't always transfer. Railway's auto-scaling meant machines stopped when idle by default. Fly.io's defaults assume you want availability over cost. Both are valid—just different models.
Two machines = double cost. Fly created a second machine during a deploy and never cleaned it up. One was suspended, one kept running. Always verify machine counts after deployments.
The platform works the same. The bill dropped 85%. One line in a config file.