Now a days most of our projects infrastructure is in Amazon Web-services (AWS). AWS provides on-demand cloud service platform offering compute power, database storage, content delivery and other functionalities.

When your entire infrastructure is in AWS, you have to be sure that at any point of time your servers doesn’t go down it may be due to sudden load, increase in CPU utilization etc.

One such requirement I came across recently. Requirement was to make sure, our servers are capable enough to handle the sudden load and task was to make sure the setup is reliable. So, we came up with a plan to test the fail-over of our servers.

Consider for the below setup in AWS, we need to perform fail-over testing and make sure, the application servers are very much reliable.

Setup

How the request travels?

  1. When you hit your application URL from browser, it will hit the ELB(Elastic load balancer) of web-servers.
  2. ELB performs a quick health check of the two web-servers i.e., Nginx-01 & Nginx-02 and based on the availability, request will be sent either through Nginx-01 or Nginx-02.
  3. Once the request passes through the Nginx(01/02), it will hit the ELB of application servers.
  4. Similar to #2, ELB of application servers performs a quick health check and sends the request to application server which in-turn process the request and sends the response.

We haven’t used JMeter to send thousand of request to server instead we used other procedures to test our servers. Below are couple of cases we covered as part of fail-over testing.

1. Increase CPU utilization :

To recall, I have mentioned earlier that, ELB, before sending a request to servers it will quickly does a health check right? Now what actually is health check?

Consider, ELB pings Server-01 for health check. Either the response will be ‘InService’ or ‘OutOfService’. When the response is ‘InService’ then the respective application server is capable to process our request and ELB will forward the request to that server i.e., server-01 in our case.

If, ELB, gets the response ‘OutOfService’ then all the subsequent requests will be routed to server-02 and before sending request to server-02, a quick health check on server-02 will be done and request will be sent. So, thumb rule of ELB is, always send requests to healthy instances.

 1.1 Increase CPU utilization of server-01.

  •  Login to server-01 and increase CPU usage by running an infinite ‘for’ loop.
  • Send request(s) to you application.
  • ELB performs a quick health check and routes the request to server-02 as server-01 is busy.
  • Confirm that you’re getting a proper response for the request sent.
  • Post confirmation, make sure to kill the job i.e., infinite ‘for’ loop executed earlier in server-01.

Sample screenshot of the two servers(01 & 02) CPU usage is below.

CMD.png

Screen shot of the CPU utilization of both Server-01 & Server-02.

In the above screen shot, Server-01 CPU row shows 0.0%id & Server-02 CPU row shows 95.7%id where id stands for CPU idle.

Server-01 CPU usage is 59.8%us & CPU idle is 0.0%id.

Server-02 CPU usage is 2.1us & idle is 95.7%id.

so in the above scenario, when the request hits the ELB, ELB routs request to server-02 as it is idle compare to server-01.

1.2 Increase CPU utilization of server-02.

  • Login to server-02 and increase CPU usage by running the same infinite ‘for’ loop.
  • Send request(s) to your application.
  • ELB performs a quick health check and routes the request to server-01.
  • Confirm that you’re getting the proper response for the request sent.
  • Post confirmation, make sure to kill the job i.e., infinite ‘for’ loop executed earlier in server-02.

Notes :

  • Health check from ELB is a configuration.
  • How many times ELB has to perform health checks for a server with-in a span of how many seconds is also a configuration i.e., 3 times health check with a time gap of 10 seconds.
  • A configuration can be set to have a primary and secondary instances i.e., you can set server-01 as primary and server-02 as secondary or vice versa. Based on the configuration set, every request will hit the primary. If, primary is either busy or unhealthy, secondary server comes in to picture.

    2. Make the servers down :

Here, instead of increasing the CPU usage, we will directly down the server or down the container(docker).

 1. Make the server-01 down and server-02 up and running.

  • Make the server-01 down.
  • Send requests to the application from browser.
  • ELB performs health check and routes the request to server-02 as server-01 is down.
  • Confirm that you’re getting a proper response for the request sent.
  • Post confirmation, make sure to make the server-01 up.

2. Make the server-01 up and running and server-02 down.

  • Make the server-02 down.
  • Send requests to the application from browser.
  • ELB performs health check and routes the request to server-01.
  • Confirm that you’re getting a proper response for the request sent.
  • Post confirmation, make sure to make the server-02 up.

3. Make the server-01 and server-02 down.

  • Make the server-01 and server-02 down.
  • As expected, your request fail to process.

So, what did we achieve by performing these cases?

  1. We verified that our servers are capable enough to handle the request when CPU’s is busy.
  2. We verified that our servers are capable enough to handle the request when one of instance goes down.
  3. We avoided sending thousands of requests blindly to our servers to achieve the fail-over.
  4. Last but not least, we saved our testing time.

That’s it. We are done with the fail-over testing.

Now, we will discuss little bit about this case too. what are the configurations AWS provides when requests over-loads on both server-01 & server-02 and how to handle this scenario.

Consider for example, each application server can handle 1000 requests and post that, processing of requests will be delayed. As per our example, when we send 1000 requests per server, we should not face any delay but when we send 5000 requests then those 3000(5000-1000*2 servers) requests should see some delay.

How to avoid this delay?

AWS provides a configuration called ‘Auto Scaling’ . What is auto-scaling?

Auto-scaling is a configuration provided by AWS to maximize the benefits of AWS cloud such as:

a) Better fault tolerance. Auto Scaling can detect when an instance is unhealthy, terminate it, and launch an instance to replace it. You can also configure Auto Scaling to use multiple Availability Zones. If one Availability Zone becomes unavailable, Auto Scaling can launch instances in another one to compensate.
b) Better availability. Auto Scaling can help you ensure that your application always has the right amount of capacity to handle the current traffic demands.
c) Better cost management. Auto Scaling can dynamically increase and decrease capacity as needed. Because you pay for the EC2 instances you use, you save money by launching instances when they are actually needed and terminating them when they aren’t needed.

We will concentrate more on ‘C’ point i.e., “Dynamically increase and decrease capacity”.

As per our example, to avoid delay of those 3000 requests, we need to enable ‘Auto Scaling’ in our AWS architecture. Once, auto-scaling is enabled, when requests over-flows, AWS will spin up a new instance(server-03) and those additional requests will be processed.

Once, the load decreases, AWS will automatically remove that extra server i.e., server-03 from the AWS and goes back to its original setup.

The reason I mentioned about ‘auto-scaling’ in this blog is, it is a good to have configuration in your project architecture so that if unexpectedly, request increases instead of spinning a new instance manually, enabling this configuration, AWS will do this job for you.

Before I close this topic, couple of key points.

  • Understand the infrastructure setup properly.
  • Understand the configurations set properly as ignoring this will increase your project cost drastically.
  • Make sure what type of load balancer used is it classic or application ELB.
  • Make sure what type of instances used, T2, M3 or M4.

Some of the useful commands I came across:

  1. To increase the CPU utilization.

           for i in 1 2 3 4; do while : ; do : ; done & done

  1. To kill the jobs/decrease the CPU utilization.

             kill -9 $(jobs -p)

  1. To check the status of the CPU

         top

  1. To start the docker container.

         sudo docker start <containerID>  OR  sudo docker start <serverName>

  1. To stop the docker container.

         sudo docker stop <containerID>  OR  sudo docker stop <serverName>

  1. To check the docker status

        sudo docker ps -a

Reference :

https://aws.amazon.com/documentation/