Most of us when we start AWS for Apache Spark, one of the first service we stumble up on is EMR.

With power comes responsibility!

With AWS EMR a brand new cluster of any size is a few clicks away, with that lets focus on how to monitor the cluster and cut down the cost as much as possible.

Typically after few days people may forget to shutdown the clusters, leaving the budget bleeding heavily on the AWS billing.

We are gonna explore how to come up with bot to alert people!

While searching for EMR monitoring and controlling, we found following links:

Both authors have given excellent ideas and steps to monitor the resources. However we were looking for more simpler solution with less moving parts.

We wanted to go with AWS BOTO3 + AIRFLOW with Slack and Gmail alerts.

Of course there is no way of escaping Cloudwatch. We use cloudwatch to monitor the EMR metrics and Boto3 EMR APIs to take any action. Check this link for list of metrics avaiable https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html

1. Alert : Slack

We used Airflow Slack operator for slack alerts, for more details check out here.

2. Alert : Gmail

We used Yagmail for pythonic way of triggering the mails. (Note : From address used in Yagmail needs special permission for 3rd party apps to login behalf and send emails!, follow the errors in your setup to get more help, as I lost reference for the setup at the moment)

Where each variable is self explanatory.

Note “ YagMail.email_notifier_ini” is wrapper class to abstract the reading of the mail credentials using configparser.

3 . AWS Boto3 : EMR + Cloudwatch

Check this repository https://github.com/Scout24/emr-autoscaling for a complete EMR scaling with Lambda and Boto3 APIs, which I used a reference.

Below is class that abstracts most of EMR and Cloudwatch interaction with easy to use APIs for our needs.

Here we are listing down all the active clusters and filtering the idle cluster based on Yarn memory, number of core and task nodes. (this is just an example, we can have more sophisticated business requirements to monitor and apply EMR policy, for naive approach this will be a good start)

Note: This code was directly tested on EMR whoch has all the permission to use Boto3, if you wanted to test on non EMR/EC2 machine, you need to set up the EMR/Cloudwatch client with proper AWS keys.

4: Airflow : Custom Operator

Now we have means to trigger alerts, monitor and control EMR. How to schedule our bot into action? With Custom Airflow Operator. Why not Python Operator, well its matter of choice and the control we wanted on our operator.

5 :Airflow Dag

And finally lets make our Airflow Dag…

Dadaa huh our bot is ready…

Our Bot : Configparser + Yagmail + Airflow Slack Operator + Boto3 (EMR & Cloudwatch) + Airflow Custom Operator

All the code snippets shared here are bits and pieces just to give the readers an idea on how to build a in house python AWS EMR monitoring bot! The code shared needs good amount of cleanups and through testing!