HA, DR, GSLB, LB: The What’s What and Who’s Who of Uptime
As a SoftLayer sales engineer, I get the opportunity to talk to a wide range of customers on a daily basis about almost everything under the sun. This is one of my favorite parts of working at SoftLayer: every day is unique and the topics range from a standalone LAMP server to thousands of servers in a big data cluster—and everything in between. It can be challenging at times, due to the infinite number of solutions that SoftLayer can run, but it also gives me the chance to learn and teach others. In this blog post, I’ll discuss high availability (HA), disaster recovery (DR), global server load balancing (GSLB), and load balancing (LB), as I occasionally hear customers mix up the terms, and I think a little clarity on the topics could help.
Before we dive into the differences, let’s define each in alphabetical order (I did take a stab at stating this in my own words, but Wikipedia does such a good job that I paraphrased from its descriptions and added in a little more context).
- High availability (HA): HA is a characteristic of a system, which aims to ensure an agreed level of operational performance for a higher than normal period. There are three principles of system design in high availability engineering: the elimination of single points of failure (SPOF), reliable failover, and failure detection.
- Disaster recovery (DR): DR involves a set of policies and procedures to enable the recovery or continuation of systems following a natural or human-induced disaster. Disaster recovery focuses on keeping all essential aspects of a business functioning despite significant disruptive events.
- Global server load balancing (GSLB): GSLB is a method of splitting traffic across multiple servers using DNS and geographical locations as the means to determine where request traffic will be sent.
- Load balancing (LB): LB is a way to distribute processing and communications evenly across multiple servers within a data center so that a single device does not carry an entire load. LB is essential in situations where it is difficult to predict the number of requests issued to a server, and it can distribute requests that would have been made to a single server to ease the load and minimize latency and other issues.
Now that we’ve defined each of these topics, let’s quickly check off the main points of each topic:
- No single points of failure (SPOF)
- Each component of a system has as at least one failover node
- If a server is part of an HA pair, it is recommended to run the OS on at least a RAID 1 group and DATA partitions on a RAID 1, 5, 6,10, or higher group
- If the system is part of a cluster, it is always recommended to run the OS on at least a RAID 1 and DATA partitions can be optimized for storage capacity
- Redundant power