Practical Startups Guide: Basics of Software Scalability

Importance of Scalable Systems for Startups

Startups face extreme amounts of uncertainty. To build a successful startup, you must be as flexible as possible. You also need to be resourceful and adapt quickly to changing conditions. These extreme requirements put on the software teams make scalability even more important and challenging than in slowly changing businesses. Things that can take an entire year in a corporate environment may need to happen in just a matter of weeks in a startup. If you are successful and lucky, you may need to scale your capacity up tenfold in a matter of weeks, just to have to scale back down a few months later.

What does Software Scalability Mean?

Scalability is an ability to adjust the capacity of the system to cost-efficiently fulfill the demands. Scalability usually means an ability to handle more users, clients, data, transactions, or requests without affecting the user experience. It is important to remember that scalability should allow us to scale down as much as scale up and that scaling should be relatively cheap and quick to do.

How is Software Scalability Measured?

The ability to scale is measured in different dimensions, as we may need to scale in different ways. Most scalability issues can be boiled down to just a few measurements:

  • Handling more data: As your business grows and becomes more popular, you will be handling more and more data. You will have to efficiently handle more user accounts, more products, more location data, and more pieces of digital content. Processing more data puts pressure on your system, as data needs to be sorted, searched through, read from disks, written to disks, and sent over the network. 
  • Handling higher concurrency levels: If you are building a web-based application, concurrency means how many users can use your application at the same time without affecting their user experience. Concurrency is difficult, as your servers have a limited amount of central processing units (CPUs) and execution threads.
  • Handling higher interaction rates: The rate of interactions measures how often your clients exchange information with your servers. The main challenge related to the interaction rate is latency. As your interactions rate grows, you need to be able to serve responses quicker, which requires faster reads/writes and often drives requirements for higher concurrency levels.

Don’t mix up scalability with performance, as they’re NOT the same thing. Performance measures how long it takes to process a request or to perform a certain task, whereas scalability measures how much we can grow (or shrink).


What are the Types of Scalability?

Once your application reaches the limits of your server (due to increase in traffic, amount of data processed, or concurrency levels), you must decide how to scale. There are two different types of scaling:

  • Scaling Vertically: is accomplished by upgrading the hardware and/or network throughput. It is often the simplest solution for short-term scalability, as it does not require architectural changes to your application. Vertical scalability is a great option for very small applications or if you can afford the hardware upgrades. However, it comes with some serious limitations that you can learn more about in the upcoming articles. 
  • Horizontal Scalability: is accomplished by adding more servers, which means you never reach a hard limit, as is the case with vertical scalability. It’s much harder to achieve and in most cases, it has to be considered before the application is built. I will describe different horizontal scalability techniques in later articles, but for now, let’s think of it as running each component on multiple servers and being able to add more servers whenever necessary. 

In the next articles, we’re going to discuss:

  • Simple scalability solutions (Cache, CDNs, Services Isolation)

  • Understanding vertical scalability.

  • Understanding Horizontal scalability.


Web Scalability for Startup Engineers, by Artur Ejsmont.