Simple Elasticsearch monitor with AWS Lambda and Amazon QuickSight

Recently I have needed a simple dashboard to monitor some Elasticsearch metrics and visualize/aggregate them. I have heard about Amazon QuickSight.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data.
You can upload CSV or Excel files; ingest data from AWS data sources such as Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon Athena, Amazon S3, and Amazon EMR (Presto and Apache Spark); connect to databases like SQL Server, MySQL, and PostgreSQL.

The Elasticsearch metrics I want to monitor are the following:

  • Indices Health (by color)
  • Indices Status
  • Number of documents
  • Storage size
  • Indices by size and health

Due to the fact that the dashboard must be really simple, I did not want to manage any complex (any at all) infrastructure so I thought about a serverless architecture.
If the word serverless sounds completely new to you or you want to read more, here you can find some useful information:

The components of my architecture are the following:

  • AWS Lambda (Python 3.6)
  • Amazon S3
  • Amazon QuickSight
  • Elasticsearch 5.6.3 – Lucene 6.6.1 (my ES Cluster is deployed on Elastic Cloud, I assume you have your own cluster deployed somewhere)

I assume you know about the previously listed components, if not, please go online and read about them before go further.

The goal of the AWS Lambda function is to fetch the Elasticsearch metrics from the cluster and store two CSV files (one for indices metrics and on for cluster metrics) to Amazon S3 (the Lambda execution is scheduled daily). Once the CSVs have been uploaded to S3, the QuickSight dashboard fetches them and displays the metrics we need.

To deploy the Lambda function, I used the Serverless framework. (version 1.23.0 – npm 5.4.2 – node v6.11.4) Serverless is your toolkit for deploying and operating serverless architectures.. I assume you know about the Serverless framework (write serverless configuration file and deploy/invoke a function) and you have installed/configured it.

Let’s start by defining the serverless .yaml configuration file.
We define a new function get_es_stats scheduled to run every 24 hours. We create a set of environment variables (related to the ES cluster details and S3 bucket).
Note that we need to define a iamRoleStatements to allow the Lambda function to write to the S3 bucket.

I am using the serverless-python-requirements plugin to install the Python requirements (note the plugins and custom elements.

Once we define the serverless configuration, let’s create the Python function that fetches the Elasticsearch metrics and post them to S3.

I run some performance test and I decided not to use the Python Elasticsearch library but to call directly the REST API of the ES cluster.
To fetch the indices stats, I used the following endpoint. Elasticsearch cat indices

To fetch the cluster health, I used the following endpoint. Elasticsearch Cluster Health

We are now ready to deploy the function. Create a requirements.txt file with following lines:

and then run

and this to manually invoke your function

Once you have ran your function you will find two CSVs file in your S3 bucket, indices_stats.csv


and cluster_stats.csv

Now you can create a QuickSight dashboard.

From the QuickSight page create a new Data Set from S3 (when you create a new QuickSight account be sure you have set the right permission to read from the S3 Bucket).

Upload two Amazon S3 manifest files, one for the indices_stats.csv file and one for the cluster_stats.csv file. You use JSON manifest files to specify files in Amazon S3 to import into Amazon QuickSight.

Once you created the two datasets you will find them in the available datasets.

You can now create a new QuickSight analysis and show the collected metrics, here few examples of visualizations. You can schedule a refresh for the two datasets so when the Lambda function will update the two CSVs on S3, QuickSight will refresh the sources and the dashboard will be updated.

Block charts showing the indices by status and health.

Pie chart the show the indices by their dimensions and total storage used.

Indices by number of documents and health and number of nodes in the cluster and active shards.

 

The goal of this post is to present a simple serverless architecture to show a few Elasticsearch metrics in a simple dashboard. You can extend and improve this architecture monitoring more metrics and creating a better QuickSight dashboard.

You can use this type of architecture when you do not have Kibana (and a X-Pack subscription) or you want a simple analytics system inside the AWS world.

Elasticsearch Machine Learning: U.S. / U.K. Foreign Exchange Rate

At the beginning of May 2017 Elastic announced the first release of machine learning features for the Elastic Stack, available via X-Pack.

The machine learning features of X-Pack (Platinum/Enterprise subscription) are focused on providing Time Series Anomaly Detection capabilities using unsupervised machine learning.

In this post we are going to see an example of time series anomaly detection using the machine learning features of Elasticsearch.

To use this features you need at least the version 5.4.0 of Elasticsearch, Kibana and X-Pack.
In this post I am not going to show how to install the stack components. I used the following:

  • Elasticsearch 5.4.1
  • Kibana 5.4.1
  • X-Pack 5.4.1 (installed both in ES and Kibana)

Here you can find the installation steps:

The machine learning feature is enabled by default on each node, here you can find more details about further configurations: Machine Learning Settings

We are going to use the following dataset: U.S. / U.K. Foreign Exchange Rate.
It represents the daily foreign exchange rate between U.S. Dollar and U.K. Pound between April 1971 and beginning of June 2017.

This is a sample of the data:

We will index the documents (around 16k) in a time-based index called foreignexchangerate-YYYY (where YYYY represents the year of the document).
The time-based index is necessary to use the machine learning feature. The Configured time field of the index will be used as time-aggregation by the feature.
I did not find a way (AFAIK) to use a not time-based index and select a date field while creating a machine learning job.

This is how each time-based index looks like:

Once we indexed our documents, and once we added the index pattern to Kibana, we can create our first machine learning job.

exchange_rate_index

To create a new Job, select the Machine Learning section from the left menu of Kibana (if you do not see it, maybe you have the wrong Kibana version or you did not install X-Pack into Kibana).

You can now choose between Single Metric or Multi metric job, we will choose Single Metric job (for the foreignexchangerate-* index pattern).

We will use the whole time series and a 3 days rolling exchange_rate average. The idea is to aggregate the series by 3 days, compute the average of the exchange rate and spot anomalies.

kibana_ml

One we configure the job, we can create it. The machine learning model will be build using our time series and the aggregation/metric we specified.

kibana_ml_1

We can now inspect the anomalies detected using the Anomaly Explorer or the Single Metric View, both from the ML Jobs dashboard.

ml_anomalies

I checked some of the anomalies automatically identified and almost all of them make sense (I found drop in the exchange rate due to events like Brexit or EU Crisis).

So far we see all the analysis inside Kibana but the machine learning feature comes also with a set of APIs, so you can integrate the time-series anomaly detection with your application.
Here you can find the details about the APIs: ML APis.

In this post we saw a simple example of how to create and run a machine learning job inside Elasticsearch. There are a lot of other aspects like the multi-metric and advanced-metric that I think are important.

The machine learning features are pretty new and I think (and hope!) that Elastic will invest a lot of resources to improve and extend it.

I am going to run some other tests on the ML features and I would like to run some anomaly detection algorithms (statistical and ML based) on the same dataset to benchmark and compare the Elasticsearch results, if you want to collaborate and help me (or if you have some knowledge/background about time series anomaly detection) drop me a line 🙂 .

Kibana Tag Cloud

In the Kibana 5.1.1 version, a new type of visualization has been added: the Tag Cloud chart.
A tag cloud visualization is a visual representation of text data, typically used to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.
In this post we are going to see how to use this new type of visualization. I assume you already have installed and configured Kibana and Elasticsearch.

First of all, create a new index and index some documents. I indexed a JSON file containing the entire works of Shakespeare.

Each document has the following format.

You can download it here (notice it is around 24 MB): shakespeare.json.

Create a new index.

And index the documents using the Bulk Index API.

Now, from the Kibana dashboard, select the tag cloud visualization chart.
kibana_v1

 

kibana_v2

You only need to specify the field to use to build the tag cloud. Notice that the tag cloud only supports the term aggregation.
kibana_v3

In this example I selected the speaker field. So the tag cloud will depict the main (higher count) speakers within the Shakespeare works.
You can select a bunch of other options like the tags font size and orientations.
kibana_v4

The main speakers within the the works of Shakespeare are Gloucester and Hamlet.
kibana_v6

You can save this visualization and add it to your dashboard.

The tag cloud visualization is a useful visual representation of text data, that can be used to depict keyword metadata (tags) of documents in a Elasticsearch index.

Elastic Stack 5.0.0 Released

It is release day at Elastic!
Today the Elastic team released the new version of the Elastic Stack: 5.0.0.
This means that we have new versions of:

  • Elasticsearch
  • Kibana
  • Beats
  • Logstash
  • ES-Hadoop
  • X-Pack

The new version of Elasticsearch is based on Lucene 6.2.0 and it comes with a boatload of enhancements and new features:

  • Indexing performance
  • Ingest node
  • Painless scripting
  • New data structures
  • Search and Aggregations
  • User friendly
  • Resiliency
  • Java REST client
  • Migration Helper

The main enhancements and new features for Kibana are the following:

  • A brand new desigg
  • Introduction of Timelion, a visualization tool for query DSL over time
  • Dev tools: Sense is now integrated in the Kibana console

New Kibana 5.0.0 design:

kibana50dashboard

I ran some tests and I noticed that the Elasticsearch indexingperformances increased. The new Kibana looks super and it is more stable and complete than the previous one.
I am really exited about this new version of the Elastic stack, in the next days I will run some further tests about the search capabilities, great job Elastic!

Here you can find the official posts from the Elastic blog:
ELK 5.0.0 release
Elasticsearch 5.0
Kibana 5.0.0

and the official Github repository:
Github Elastic

Elastic and Prelert

Have you ever wonder if you can run unsupervised machine learning tasks within Elasticsearch? Since today, you can do it!
Elastic today announced that Prelert and Elastic are joining forces.

Prelert is the leading provider of behavioral analytics for IT security, IT operations, and business operations teams, with Prelert you can find anomalies within transactions / operational metrics, detect uncharacteristic user behavior, find a population of attack IP addresses, and much more and since today directly from the Kibana UI.

Prelert – Kibana UI integration preview:

prelert_kibana

The integration has been just presented on the Elastic{ON} Tour, I will try the new product as soon as it will be released and write a post about it.

You can find the post from the official Elastic blog here: Welcome Prelert to the Elastic Team