Skip to main content

Horizontal Scaling

A platform can grow in two ways. Vertical scaling means buying a bigger server: more CPU, more RAM, faster disk. Horizontal scaling means adding more servers and spreading the load across them. Vertical hits a ceiling and gets expensive. Horizontal scales nearly indefinitely if the platform is designed for it.

Dashify is designed for horizontal scaling. This page explains how that design hangs together.

The two questions every horizontally-scaled platform must answer

When you have multiple instances of an API serving the same product, two questions surface immediately.

  1. Where does shared state live? Each instance handles a slice of requests; if every request needs to consult shared state, that state has to live in one place all instances can reach.
  2. How do instances coordinate when something happens on one of them that affects another? A real time event triggered on instance A needs to reach a user connected to instance B.

Dashify answers both with Redis.

Stateless API instances

The Dashify API is stateless. An instance does not hold sessions in memory, does not cache work-in-progress, does not depend on which user has connected to it. Every piece of state that lives longer than a single request is in MongoDB or Redis, both shared by all instances.

This means the load balancer can route any request to any instance and the answer is the same. It also means restarting an instance is risk-free, there is no in memory state to lose. New instances spin up cold and are ready to serve immediately.

Sessions in Redis solve question 1

Every API instance reads and writes sessions to the same Redis. A user logs in on instance 1; their cookie carries the session id; the next request goes to instance 3; instance 3 looks up the session in Redis and sees the same data instance 1 wrote. The user does not notice that they have been moved.

This is also why session revocation works instantly across the fleet, delete the Redis record, every instance sees the absence on its next read.

Pub/Sub solves question 2

When instance 1 receives a chat message, it needs to deliver that message to the user's other open tabs and to other participants in the channel, who might be connected to instance 2 or 3. The Socket.IO Redis adapter publishes a message to a Redis pub/sub channel; every instance is subscribed; every instance forwards the message to whichever clients it currently has connected.

This is the mechanism that makes real time work across a horizontally-scaled fleet. We covered the message flow on the Redis page; horizontal scaling is the reason this matters.

The worker process scales independently

The API and the worker are separate processes. They share the codebase and the database, but they consume different resources. The API is bound by request volume; the worker is bound by job throughput. Each can be scaled independently.

If background jobs are piling up, add more worker instances. If the API is slow, add more API instances. They do not interfere.

BullMQ is designed for multiple workers to consume the same queue without stepping on each other. Each job is delivered to exactly one worker; if a worker dies mid-job, the job is re-delivered to a different worker after a timeout.

Database scaling

The application tier scales horizontally easily. The database tier is harder.

MongoDB scales by adding replica sets (read replicas) and eventually sharding (writing across multiple primaries). For most Dashify deployments a single replica set with a couple of read replicas is plenty, millions of users fit in a single cluster.

Redis scales by adding replicas (read), partitioning (Redis Cluster, for write), or simply running with a generous instance, Redis is fast enough that a single node handles enormous load.

The platform's data layer is, in practice, the bottleneck last. Tune the application first.

Stateless workers, stateless connections

The worker is also stateless. Jobs are picked up from Redis, processed, and the result is written to MongoDB. The worker does not retain any state between jobs. A worker can crash and restart with no impact beyond the in flight job (which is re-delivered).

Socket.IO connections are stateful from the user's perspective, the connection stays open, but the server side is essentially stateless given the Redis adapter. If an instance dies, the user's WebSocket disconnects, the client transparently reconnects to a different instance, and the user notices a half-second blip at most.

Graceful shutdowns

When an instance restarts (deploy, scale-down, OS update), it should not interrupt in flight requests. Dashify implements graceful shutdown on every process:

  • The instance stops accepting new connections.
  • It finishes the requests it is already handling.
  • The worker stops accepting new jobs and finishes the ones already running.
  • Real-time connections are dropped politely (the client reconnects elsewhere).
  • Once everything is drained, the process exits.

This is what allows zero downtime deploys, new instances come up before old ones go down, and no user request is lost in the handover.

What does not scale

A few things in Dashify do not scale horizontally without operator attention:

  • The on prem AI tier (Ollama). A single Ollama instance can serve many concurrent requests, but heavy AI traffic eventually saturates it. Scaling means running Ollama on a GPU host, or running multiple Ollama instances behind their own load balancer, or moving to a hosted model.
  • The vector database (Qdrant). Qdrant supports clustering for horizontal scale, but a single node is plenty up to millions of vectors per tenant.
  • The nightly indexer. A single worker runs the indexer because tenants are processed one at a time. For a deployment with thousands of tenants, the indexer would need partitioning. Not needed yet.

These are honest limitations the operator should plan for once the deployment grows enough to bump into them.

Cost shape

Horizontal scaling has a nice property: cost grows roughly linearly with load. Doubling the user base doubles the API instances and roughly doubles the bill. Vertical scaling is usually super-linear, the next-bigger instance type is more than twice as expensive, partly because you pay for headroom you may never use.

The corollary is that horizontal scaling is also easy to size down. When traffic drops, instances are removed. The bill drops too. Dashify is comfortable running on a single small instance during off-peak hours and scaling up before peak.

Key takeaways

  • Dashify's API and worker are stateless, every piece of shared state lives in Redis or MongoDB.
  • The load balancer can route any request to any instance.
  • Real-time fanout across instances goes through Redis Pub/Sub (Socket.IO adapter).
  • Workers and APIs scale independently, different bottlenecks, different sliders.
  • Graceful shutdown enables zero downtime deploys.
  • The data tier and the AI tier scale with operator attention; the application tier scales freely.