Things to Consider When Building HA in the Public Cloud

We continue to provide further education about the public cloud as it still seems to be considered the “wild west”. For those that have settled in a somewhat stable place and have gained significant insight into the positives, while acknowledging the challenges, should share that knowledge to help everyone gain ground. Today we want to focus on what building HA (High Availability) in the public cloud looks like and things to consider when making this journey.

Here are 3 things to consider:

  • Understand the Basics – In the simplest form the first step is understanding the difference between Zones and Regions in the public cloud. This may sound simple, but the S3 storage outage that effected so many AWS customers back in February, even large ones including AWS’s own status page, caught many by surprise. Where was my redundancy? Well, the problem was they built all of their redundancy across multiple zones within the same region. This means that if something like storage for all zones within a single region go down, then their redundancy plan is flawed, case in point with the AWS S3 outage. If people had set up the same redundancy across multiple regions properly they would not have been down like they were. The simplest way to understand a zone versus a region actually ties to your traditional data center concepts. A region is a single data center and zones are different segmented parts within the data center that have different critical components within them. To avoid outages like the one in February, you need to consider using multiple regions (data centers) for true HA environments within the public cloud.


  • Figure out the Right Tools for Your HA Environment – The tools available in the public cloud are like Lego pieces spread out all over the floor for you to fit together properly. The challenge is finding the right pieces for what you want to build. There are tools like auto-server recovery, auto-scaling, multi-regional load balancing, active-active and active-passive configurations across regions, DDOS scrubbing services, and more. What is it that your environment needs to make it HA? If you and your team are unaware, avoid the pitfalls of trying to figure it all out on your own. When you don’t fully understand what you are getting yourselves into you can miss something seemingly small in a configuration that will still take you down in an instance like the S3 event mentioned above. Find the right partner with the right public cloud experience to help you through it. If HA is truly critical to your site, or application, this investment will be worth every dime spent.


  • Beware of the Price Tag – The surprising part to most people who haven’t dug into the public cloud pricing models before is that it isn’t always the cheapest option. This is especially true when it comes to creating HA environments in the public cloud. If you want dedicated devices transferring data across multiple regions utilizing multiple HA tools to ensure your environment is truly HA, most times you could actually wind up spending more. Look up everything that Netflix has had to do to make AWS truly HA for them. They have actually built their own tools to randomly try and take down certain devices within their environments to ensure they will actually restore and shift loads appropriately to avoid any downtime. Netflix at one point wanted to get away from AWS to avoid all of the costs and effort it took to make their environments in HA, but it was simply too big of an endeavor and they were essentially stuck. The main point here is to simply make sure that you evaluate a CSP’s (Cloud Service Provider’s) pricing and compare it to AWS or Azure. The results will probably surprise you, not to mention the efforts you will save your team as most CSPs will build it for you through their standard processes avoiding possible gaps as mentioned above.


More and more IT organizations find themselves supporting applications or sites that must be available 24×7.  Many of them also risk losing significant revenue, or even customers, if they are down for more than 30 minutes. This is not only extremely stressful for those teams, it’s also more complicated than most people understand. Hardware fails, it’s a part of life, but there is a lot of hardware involved in creating today’s infrastructure, especially for larger environments. Creating redundancy for all of the items within the environment is quite the endeavor. Again, this gets even more complicated and possibly costly, in the public cloud. It can be done and all of the tools out there are very cutting edge in terms of technology, but unless you have a staff like Netflix or a partner to make this happen, it can be tricky. Just don’t be surprised if they have options within their own data centers that could save your team time and money.