Elasticsearch Machine Learning: U.S. / U.K. Foreign Exchange Rate

At the beginning of May 2017 Elastic announced the first release of machine learning features for the Elastic Stack, available via X-Pack.

The machine learning features of X-Pack (Platinum/Enterprise subscription) are focused on providing Time Series Anomaly Detection capabilities using unsupervised machine learning.

In this post we are going to see an example of time series anomaly detection using the machine learning features of Elasticsearch.

To use this features you need at least the version 5.4.0 of Elasticsearch, Kibana and X-Pack.
In this post I am not going to show how to install the stack components. I used the following:

  • Elasticsearch 5.4.1
  • Kibana 5.4.1
  • X-Pack 5.4.1 (installed both in ES and Kibana)

Here you can find the installation steps:

The machine learning feature is enabled by default on each node, here you can find more details about further configurations: Machine Learning Settings

We are going to use the following dataset: U.S. / U.K. Foreign Exchange Rate.
It represents the daily foreign exchange rate between U.S. Dollar and U.K. Pound between April 1971 and beginning of June 2017.

This is a sample of the data:

We will index the documents (around 16k) in a time-based index called foreignexchangerate-YYYY (where YYYY represents the year of the document).
The time-based index is necessary to use the machine learning feature. The Configured time field of the index will be used as time-aggregation by the feature.
I did not find a way (AFAIK) to use a not time-based index and select a date field while creating a machine learning job.

This is how each time-based index looks like:

Once we indexed our documents, and once we added the index pattern to Kibana, we can create our first machine learning job.

exchange_rate_index

To create a new Job, select the Machine Learning section from the left menu of Kibana (if you do not see it, maybe you have the wrong Kibana version or you did not install X-Pack into Kibana).

You can now choose between Single Metric or Multi metric job, we will choose Single Metric job (for the foreignexchangerate-* index pattern).

We will use the whole time series and a 3 days rolling exchange_rate average. The idea is to aggregate the series by 3 days, compute the average of the exchange rate and spot anomalies.

kibana_ml

One we configure the job, we can create it. The machine learning model will be build using our time series and the aggregation/metric we specified.

kibana_ml_1

We can now inspect the anomalies detected using the Anomaly Explorer or the Single Metric View, both from the ML Jobs dashboard.

ml_anomalies

I checked some of the anomalies automatically identified and almost all of them make sense (I found drop in the exchange rate due to events like Brexit or EU Crisis).

So far we see all the analysis inside Kibana but the machine learning feature comes also with a set of APIs, so you can integrate the time-series anomaly detection with your application.
Here you can find the details about the APIs: ML APis.

In this post we saw a simple example of how to create and run a machine learning job inside Elasticsearch. There are a lot of other aspects like the multi-metric and advanced-metric that I think are important.

The machine learning features are pretty new and I think (and hope!) that Elastic will invest a lot of resources to improve and extend it.

I am going to run some other tests on the ML features and I would like to run some anomaly detection algorithms (statistical and ML based) on the same dataset to benchmark and compare the Elasticsearch results, if you want to collaborate and help me (or if you have some knowledge/background about time series anomaly detection) drop me a line 🙂 .

Elastic Stack in A Day – Milano 2017

Elastic Stack In A Day 2017 is the third edition of the Italian event dedicated to the Elastic technologies.
The event is organized by Seacom with the collaboration of Elastic and it will take place on June 6 2017 @Milano, Hotel Michelangelo.

esinaday_cover

During the event the news of Elasticsearch and X-Pack will be presented and in the afternoon there will be technical speeches held by developer and engineer (some of them from Elastic).

I will be speaking about Machine learning with TensorFlow and Elasticsearch: Image Recognition.

speech_esinaday

Here the agenda of my speech:

  • What is Image recognition? (few examples, focus on machine learning techniques)
  • Tensorflow (what is it? How to use it for image recognition? Alternatives?)
  • Case history:
    • Description
    • Architecture: which components have been used
    • Implementation (Tensorflow + Elasticsearch)
  • Demo

Here you can find the agenda of the event: Seacom – Elastic Stack in a day 2017 and here you can find the (free) ticket: Eventbrite – Elastic Stack in a day 2017

Hope to see you there 🙂

Here you can find the slides of the speech: Tensorflow and Elasticsearch

Skedler: PDF, XLS Report Scheduler Solution for Kibana

With Kibana you can create intuitive charts and dashboards. Since Aug 2016 you can export your dashboards in a PDF format thanks to Reporting. With Elastic version 5 Reporting has been integrated in X-Pack for the Gold and Platinum subscriptions.
Recently I tried Skedler, an easy to use report scheduling and distribution application for Kibana that allows you to centrally schedule and distribute Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders.
Skedler is a standalone app that allows you to utilize a new dashboard where you can manage Kibana reporting tasks (schedule, dashboards and saved search). Right now there are four different price plans (from free to premium edition).
Here you can find some useful resources about Skedler:

In this post I am going to show you how to install Skedler (on Ubuntu) and how export/schedule a Kibana dashboard.
First, download the installer from the official website and untar the archive.

Edit the install.sh file and set the value of the JAVA_HOME variable (check your current variable using echo $JAVA_HOME):

java_home_skedler

Install Skedler:

Once Skedler is installed, edit the config/reporting.yml file. You have to set the Elastcsearch and Kibana URLs and eventually authentication and proxy details.
You can now run Skedler as a service:

or manually with

Now Skedler is running on the port 3000:

skedler_running
If you want read more about how to install Skedler:

From the Skedler dashboard we can now schedule a report.
The steps to schedule a new report are the following:

Report Details

Create a new schedule, select the Kibana dashboard or saved query and the output format. In the example I selected a dashboard called “My dashboard” (that I previously create in Kibana) and PDF format.

skedler_schedule_report

Layout Details

Select the font-family, page size and company logo.

skedler_layout_details

 

Schedule Details

Define a schedule frequency and a time window for the data.

skedule_schedule_details

 

Once you finished the configuration, you fill find the new schedule in the Skedler dashboard. You can set a list of email addresses to which the report will be sent.

skedler_dashboard

If you want to see how your exported dashboard will look like, you can Preview it. This is how my dashboard look like (note that it is a PDF file).

skedler_out_pdf

In this post I demonstrated how to install and configure Skedler and how create a simple schedule for our Kibana dashboard. My overall impression of Skedler is that it is a powerful application to use side-by-side with Kibana that helps you to deliver your contents directly to your stakeholders.

These are main benefits that Skedler offers:

  • It’s easy to install
  • Linux and Windows support (it runs on Node.js server)
  • Reports generated locally (your data are not sent to cloud or Skedler servers)
  • Competitive price plans
  • Support to Kibana 4 and 5 releases.
  • Automatically discovery your existing Kibana Dashboards and Saved Searches (so you can easily use Skedler in any environment, no new stack installation needed)
  • It let you centrally schedule and manage who gets which reports and when they get them
  • Allows for hourly, weekly, monthly, and yearly schedules
  • Generates XLS and PNG reports besides PDF  as opposed to Elastic Reporting that only supports PDF

I strongly recommend that you try Skedler (there is a free plan and 21-days trial) because it can help you to automatically deliver reports to your stakeholders and it integrates with your ELK environment without any modification to your stack.

Here you can find some more useful resource form the official website:

Elasticsearch X-Pack Graph

There are many potential relationships living among the documents stored in your Elastic indexes. Thanks to Graph you can explore them and perform graph analysis on your documents.
Graph is a product released in the middle of 2016 as Kibana plugin and now with Elastic version 5.0 is included in the X-Pack extension (Platinum subscription).
Graph is an API- and UI-driven tool, so you can integrate the Graph capabilities into your applications or explore the data using the UI.

Examples of useful information deduced by a graph analysis are the following:

  • Discover which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
  • Suggest the next best song for a listener who digs Mozart based on their preferences to and keep them engaged and happy
  • Identify potential bad actors and other unexpected associates by looking at external IPs that machines on your network are talking to

You can install the X-Pack into Elasticsearch using the command:

and into Kibana using the command:

Here you can find useful resources about Graph capabilities and subscriptions:

In this post I am going to show you an example of how Graph works. We will perform a set of analysis on a dataset that lists all current City of Chicago employees, complete with full names, departments, positions, and annual salaries.
You can read more about the dataset and download it here: City of Chicago employees
These are the metadata of the dataset:

  • Name: name of the employee
  • Surname: surname of the employee
  • Department: the department where he works
  • Position: the job position
  • Annual Salary: annual salary in dollars
  • Income class*: the income class based on the annual salary
  • Sex*: male or female

I computed the field marked with the *, you will not find them in the original dataset.
This is how the CSV file looks like:

dataset_sample

I converted the CSV file to a JSON file (using a Python script) to easily index the JSON documents to Elasticsearch using the bulk index API.
The JSON file I produced looks like this (the name of my index is chicagoemployees and the type is employee:

dataset_json_sample

Once you have the JSON file you can index you documents in bulk:

Now that we have installed X-Pack and that we indexed our document we can start to explore them using Graph.

kibana_graph

Here few examples of analysis performed on the data:
Which departments have the lowest annual income?

department_income_example

And which have the highest?

department_income_example_2

 

We can see that who is working in the Law, Health or Fire departments have a higher annual salary than who is working in the Public Library or City Clerk departments.
The thicker edges represent stronger relation (more related documents).

We want now to highlight the relationships between Departments and Positions for the female employees that work in the Police department.

department_position
We can see that the main relationship is between the Police department and the Police Office position but also that the Clerk III position is shared among a lot of departments.

The last example shows the relationships between the gender and the departments, showing that some departments are common between male and female while others are not.

sex_department

In this post we saw how to import some documents into Elasticsearch and exploit the Graph tool to discover relationships among our documents. Graph is a really powerful tool because it helps you to find out what is relevant in you data (it is not an easy task because popular is not always the same as relevant).

I suggest you to try Graph, it can easily run in your Kibana instance and analyze your existing indexes (you do not need any pre-processing on your documents).

Here you can download all the resources used in the demo:

Kibana Tag Cloud

In the Kibana 5.1.1 version, a new type of visualization has been added: the Tag Cloud chart.
A tag cloud visualization is a visual representation of text data, typically used to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.
In this post we are going to see how to use this new type of visualization. I assume you already have installed and configured Kibana and Elasticsearch.

First of all, create a new index and index some documents. I indexed a JSON file containing the entire works of Shakespeare.

Each document has the following format.

You can download it here (notice it is around 24 MB): shakespeare.json.

Create a new index.

And index the documents using the Bulk Index API.

Now, from the Kibana dashboard, select the tag cloud visualization chart.
kibana_v1

 

kibana_v2

You only need to specify the field to use to build the tag cloud. Notice that the tag cloud only supports the term aggregation.
kibana_v3

In this example I selected the speaker field. So the tag cloud will depict the main (higher count) speakers within the Shakespeare works.
You can select a bunch of other options like the tags font size and orientations.
kibana_v4

The main speakers within the the works of Shakespeare are Gloucester and Hamlet.
kibana_v6

You can save this visualization and add it to your dashboard.

The tag cloud visualization is a useful visual representation of text data, that can be used to depict keyword metadata (tags) of documents in a Elasticsearch index.