1] What is fail-over testing? 

It is a backup process where the functions of the secondary system component is considered over the primary system components as the primary system components becomes unavailable either because of critical failures or at the point system reaches a performances threshold.

2] Details of the Open distro Cluster 

  • Number of Masters :- 3 ( m01, m02, m03)
  • Number of data nodes :- 3 (s01, s02, s03)

3] Functionalities/Operations performed

  • Read – GET request –  A request which fetches the data from elastic search.
  • Write – POST request – A request which updates the data in elastic search.

4] Test scenarios and its combination.

  1. ✓              –       Node up & running
  2.  X             –       Node stopped. 
  3. Passed    –      Request passed Successfully.  
  4. Failed      –      Request failed. 
  5. M              –      Master node.
  6. S               –       Data node.
Case #
M-01
M-02
M-03
S-01
S-02
S-03
GET REQUEST  
POST REQUEST  
1.
Passed
Passed
2.
X
PASSED
Passed
3. 
X
Passed
Passed
4.
X
Passed
Passed
5. 
X
X
Passed
Failed
6.
X
X
Passed
Failed
7.
X
X
Passed
Failed
8.
X
Passed
Passed
9.
X
Passed
Passed
10.
X
Passed
Passed
11.
X
X
Passed
Passed
12.
X
X
Passed
Passed
13.
X
X
Passed
Passed
14.
X
Passed
Passed
15.
X
Passed
Passed
16.
X
Passed
Passed
17.
X
X
Passed
Passed
18.
X
X
Passed
Passed
19.
X
X
Passed
Passed
20.
X
X
X
Passed
Passed
21.
X
X
X
Passed
Passed
22.
X
X
X
Passed
Passed
23.
X
X
Passed
Failed
24.
X
X
Passed
Failed
26.
X
X
Passed
Failed
27.
X
X
X
Passed
Failed
28.
X
X
X
Passed
Failed
29.
X
X
X
Passed
Failed
30.
X
X
X
X
Failed
Failed
31.
X
X
X
X
Failed
Failed
32.
X
X
X
X
Failed
Failed

 

5] Specific case where cluster fails to sync data.

Steps:- 

  1. Make sure all the three masters are up and running i.e., M-01, M-02 and M-03
  2. Stop two of the data nodes  i.e., S-02 and S-03 and let S-01 is up & running.

Cluster setup should be, M-01, M-02, M-03 & S-01 only.

  1. Perform a POST request which will update the elastic search.

           This means that the only available data node is S-01 has the updated data. 

4. Stop the S-01 data node.

         Current cluster should be, M-01, M-02 & M-03 only. No data nodes are up and running.

  1. Bring up the S-02.

Result- Subsequent GET/POST request fails. Logs says “Failed to execute phase [query], all shards failed

  1. Now, bring up S-03.

Result- Subsequent GET/POST request fails. Logs says “Failed to execute phase [query], all shards failed

  1. Now, bring up S-01. The actual initial only node which has updated data.

Result- Subsequent GET/POST request passed. Data is synced across all the other data nodes.

Conclusion of this scenario:- This concludes that, when two of the data nodes are down and the third up and running data node has the updated data and if this third node goes down then unless and until this specific third data node is up again, sync between all the nodes doesn’t happen. Data sync fails in this scenario.

6] Data sync is successful in this case.

Steps:

  1. Make sure all the masters are up and running i.e., M-01, M-02 & M-03.
  2. Make sure only one data node is up and running ie., S-01. 

Current cluster setup should be, M-01/02/03 & S-01

  1. Perform a POST request.
  2. Bring up S-02.
  3. Bring down S-01.

            Current cluster setup should be, M-01/02/03 & S-02.

Result: On performing GET request, the data we have updated in step-1 is successfully synced from s-01 to s-02. 

Overall Conclusion of fail-over testing:

At any given point of time, two masters and two data nodes must and should be up and running for successful sync between nodes. Data sync within the cluster completely fails if minimum of two masters and data nodes are not up and running. 

Useful Links:

Open Distro for Elasticsearch documentation link – Click here.