Simple Elasticsearch monitor with AWS Lambda and Amazon QuickSight

Recently I have needed a simple dashboard to monitor some Elasticsearch metrics and visualize/aggregate them. I have heard about Amazon QuickSight.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data.
You can upload CSV or Excel files; ingest data from AWS data sources such as Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon Athena, Amazon S3, and Amazon EMR (Presto and Apache Spark); connect to databases like SQL Server, MySQL, and PostgreSQL.

The Elasticsearch metrics I want to monitor are the following:

  • Indices Health (by color)
  • Indices Status
  • Number of documents
  • Storage size
  • Indices by size and health

Due to the fact that the dashboard must be really simple, I did not want to manage any complex (any at all) infrastructure so I thought about a serverless architecture.
If the word serverless sounds completely new to you or you want to read more, here you can find some useful information:

The components of my architecture are the following:

  • AWS Lambda (Python 3.6)
  • Amazon S3
  • Amazon QuickSight
  • Elasticsearch 5.6.3 – Lucene 6.6.1 (my ES Cluster is deployed on Elastic Cloud, I assume you have your own cluster deployed somewhere)

I assume you know about the previously listed components, if not, please go online and read about them before go further.

The goal of the AWS Lambda function is to fetch the Elasticsearch metrics from the cluster and store two CSV files (one for indices metrics and on for cluster metrics) to Amazon S3 (the Lambda execution is scheduled daily). Once the CSVs have been uploaded to S3, the QuickSight dashboard fetches them and displays the metrics we need.

To deploy the Lambda function, I used the Serverless framework. (version 1.23.0 – npm 5.4.2 – node v6.11.4) Serverless is your toolkit for deploying and operating serverless architectures.. I assume you know about the Serverless framework (write serverless configuration file and deploy/invoke a function) and you have installed/configured it.

Let’s start by defining the serverless .yaml configuration file.
We define a new function get_es_stats scheduled to run every 24 hours. We create a set of environment variables (related to the ES cluster details and S3 bucket).
Note that we need to define a iamRoleStatements to allow the Lambda function to write to the S3 bucket.

I am using the serverless-python-requirements plugin to install the Python requirements (note the plugins and custom elements.

Once we define the serverless configuration, let’s create the Python function that fetches the Elasticsearch metrics and post them to S3.

I run some performance test and I decided not to use the Python Elasticsearch library but to call directly the REST API of the ES cluster.
To fetch the indices stats, I used the following endpoint. Elasticsearch cat indices

To fetch the cluster health, I used the following endpoint. Elasticsearch Cluster Health

We are now ready to deploy the function. Create a requirements.txt file with following lines:

and then run

and this to manually invoke your function

Once you have ran your function you will find two CSVs file in your S3 bucket, indices_stats.csv


and cluster_stats.csv

Now you can create a QuickSight dashboard.

From the QuickSight page create a new Data Set from S3 (when you create a new QuickSight account be sure you have set the right permission to read from the S3 Bucket).

Upload two Amazon S3 manifest files, one for the indices_stats.csv file and one for the cluster_stats.csv file. You use JSON manifest files to specify files in Amazon S3 to import into Amazon QuickSight.

Once you created the two datasets you will find them in the available datasets.

You can now create a new QuickSight analysis and show the collected metrics, here few examples of visualizations. You can schedule a refresh for the two datasets so when the Lambda function will update the two CSVs on S3, QuickSight will refresh the sources and the dashboard will be updated.

Block charts showing the indices by status and health.

Pie chart the show the indices by their dimensions and total storage used.

Indices by number of documents and health and number of nodes in the cluster and active shards.

 

The goal of this post is to present a simple serverless architecture to show a few Elasticsearch metrics in a simple dashboard. You can extend and improve this architecture monitoring more metrics and creating a better QuickSight dashboard.

You can use this type of architecture when you do not have Kibana (and a X-Pack subscription) or you want a simple analytics system inside the AWS world.

Leave a Reply

Your email address will not be published. Required fields are marked *