Application Server Horizontal Scaling

What is Horizontal Scaling?

In the context of Frappe Cloud, horizontal scaling refers to adding another application server to share the traffic and workload during high usage.

Instead of upgrading a single server to a bigger one via plan changes (vertical scaling), we temporarily add a secondary server to run the same sites and benches alongside the primary one.

You can think of it as

During low traffic, your sites continue to run on the same primary server, during peak hours generating higher traffic, the load is split across two servers with zero downtime on your sites.

Why Do We Need a Secondary Server?

A secondary server provides extra compute only when it’s needed. When CPU usage rises (for example, during heavy API traffic, heavy background jobs, etc.), the secondary server:

  • Syncs the same benches as the primary server

  • Starts running workloads alongside it diverting incoming traffic to both servers and utilising workers of both primary and secondary servers

  • Allows your sites to continue functioning without delay or downtime

Setting Up a Secondary Server

You can select a secondary server from the actions section of the server tab as shown below.

imagec7e1ca

The above shown setup phase will also pompt you to select a plan for the secondary server, this is the plan that the secondary server will operate on, making the secondary server compute configurable, please note that the secondary server plans shown will be of same or higher computer capacity than that of the primary server.

Once the setup is completed this is how the dashboard would look.

imagec0b601

Scaling Up

During a scale-up event, a secondary server is started, starts running the benches, and begins sharing live workload with the primary server. This helps distribute traffic and keeps the system responsive under heavy usage.

How do we distribute traffic?

When selecting a secondary server plan, users can choose a plan that is equal to or higher than the current primary server’s compute capacity. Based on the chosen plan, traffic distribution during scaling is configured as follows:

  • If the secondary server has the same compute capacity as the primary server, 50% of the requests are routed to it.

  • If the secondary server has higher compute capacity than the primary server, it receives 3× the number of requests compared to the primary server.

Scaling Down

When the load reduces, traffic is routed back to the primary server and the secondary server is safely shut down to avoid unnecessary billing. All queued jobs are processed gracefully before shutdown.

Beta Note

Since this is currently a beta feature, it is only available in the Mumbai Region and automatic scaling decisions are still being monitored. For the moment:

  • You can manually scale up or scale down from the server dashboard.

  • You can also schedule scale-ups or scale-downs in advance (for example, if you expect peak traffic at a specific time) as shown in the following dialog.

image70abe4

image656a35

Zero Downtime Scaling

Auto-scaling on is designed to add or remove compute capacity without interrupting your sites. Your primary server keeps serving requests while the secondary server is prepared in the background. Once it’s ready, traffic begins flowing to it automatically.

Note: First Scale-Up May Take Longer

The very first scale-up may take some time once the secondary server is prepared. This is because the secondary server needs to pull all required Docker images of the benches. The timing depends on the number of benches and their image sizes. Subsequent scale-ups are significantly faster, since these images are kept on the secondary server.

Manual and Scheduled Scaling (Beta)

Since this feature is currently in beta, auto-scaling decisions are done manually or via schedules: - Manual scale actions must be at least 5 minutes apart (to allow the system to complete the previous scaling safely)

  • Scheduled scale actions must be at least 1 hour apart (intended for predictable load patterns, e.g., monthly or daily peaks)

While true load-based automatic scaling is coming soon, this beta phase gives customers full control while we evaluate automatic scaling.

Time taking operations

Below are a few common time taking actions during setups or scaling.

  • Setting up autoscaling as shown above, might take sometime depending on the number and size of the benches on the server, while this is running the server will be in the Installing state, however all the sites will continue to run without any downtime.

  • First auto scale will be time taking as well subject to the number of benches, however the subsequent scale operations are likely to be significantly faster.

Pricing

Secondary server billing only applies when it is active (during scaled-up periods).

  • You get 10% off the secondary server’s base price

  • Billing is hourly, and only for hours during which the secondary server was active

  • When scaled down, no secondary server cost applies

Final cost = (hours scaled up) × (primary app server plan price – 10% discount)

This can be viewed from the invoice and is billed on an hourly basis

image2131d9

Opting out

If you no longer want to use horizontal scaling, you can opt out anytime from the Server Actions:

image87b1a0

When you trigger a teardown, the primary server will switch to the Installing state while the secondary server is removed. Once the process completes, the primary server will return to Active status. Your sites will continue to run without any downtime during this entire process.

Once the secondary server is dropped no scaling opertations can take place on the server/

If Something Goes Wrong

If you notice a Failure status in your Auto-Scale list view, please contact Support immediately. Do not attempt manual fixes, as this may make it harder to debug the exact cause of the failure.

Discard
Save
This page has been updated since your last edit. Your draft may contain outdated content. Load Latest Version
Was this article helpful?

On this page

Review Changes ← Back to Content
Message Status Space Raised By Last update on