Combine Amazon Translate with Elasticsearch and Skedler to build a cost-efficient multi-lingual omnichannel customer care – Part 1 and 2 – Skedler Blog

I’ve just published a new blog post on the Skedler Blog.

In this two-part blog post, we are going to present a system architecture to translate customer inquiries in different languages with AWS Translate, index this information in Elasticsearch 6.2.3 for fast search, visualize the data with Kibana 6.2.3, and automate reporting and alerting using Skedler.

The components that we are going to use are the following:

  • AWS API Gateway
  • AWS Lambda
  • AWS Translate
  • Elasticsearch 6.2.3
  • Kibana 6.2.3
  • Skedler Reports and Alerts

System architecture:

You can read the full post – Part 1 – here: Combine Amazon Translate with Elasticsearch and Skedler to build a cost-efficient multi-lingual omnichannel customer care – Part 1.

Part 2 – here: Combine Amazon Translate with Elasticsearch and Skedler to build a cost-efficient multi-lingual omnichannel customer care – Part 2 of 2.

Please share the post and let me know your feedbacks.

Real-time Tweets geolocation visualization with Elasticsearch and Kibana region map

In Kibana version 5.5 a new type of chart has been added: Region Map.
Region maps are thematic maps in which boundary vector shapes are colored using a gradient: higher intensity colors indicate larger values, and lower intensity colors indicate smaller values. These are also known as choropleth maps.

In this post we are going to see how to use the Region Map to visualize the geolocation detail of a stream of Tweets (consumed using the Twitter streaming API). Basically we will show the location (by country) of a stream of Tweets on the map (higher intensity colors indicate larger volume of Tweets).

Here you can read more about the Region Map:

I am using Elasticsearch and Kibana version 5.5 on Ubuntu 14.04 and Python 3.4.

We are going to use the Twitter streaming API to consume the public data stream flowing through Twitter (set some hashtags/keywords to filter the tweets). Given the latitude and longitude (GEOJson format) of each tweet (when available) we are going to use the Google Maps API (Geocoding) to get the country name (or code) from the latitude and longitude.
Once we identified the country (given the latitude and longitude), we are going to index the Tweet to Elasticsearch and then visualize its location using the Kibana Region Map.
For each Tweet we are interested to the country (that represents the geographic location of the Tweet as reported by the user or client application), the text (for further query) and the creation date (to filter our result).

First of all, define a new Elasticsearch mapping called tweet, within the index tweetrepository:

Notice that the country field is a keyword field type (A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags). It will be used as join (between the map and the term aggregation) field for the Region Map visualization.

Using Python tweepy we are going to read the public stream of Tweets.

For each tweet we are going to use the Google API to identify the country from the GEOJson details. Once we identified the country we index the document to Elasticsearch.

This is how an indexed document looks like.


We are going now to create a new region map visualization.


In the option section of the visualization, select the Vector Map. This is the map layer that will be used. This list includes the maps that are hosted by the Elastic Maps Service as well as your self-hosted layers that are configured in the config/kibana.yml file. To learn more about how to configure Kibana to make self-hosted layers available, see the region map settings documentation.

We will use the World Country vector map. The join field is the property from the selected vector map that will be used to join on the terms in your terms-aggregation. In this example the join field is the country name (so we can match the regions of the map with our documents).

In the style section you can choose the color schema (red to green, shades of blue/green, heatmap) that will be used.


In the buckets section select the country field (field of our mapping). The values of this field will be used as lookup (join) on the vector map.



This is how our region map looks like. The darker countries are the one with a higher number of Tweets.

I really like this new type of visualization, it easy to use and allows you to add nice visualization map (even with self-hosted layers that are configured in the config/kibana.yml file) to your Kibana dashboards.

If you use Kibana to visualize logs and if you use Logstash take a look at this plugin: GeoIP Filter. The GeoIP filter adds information about the geographical location of IP addresses, based on data from the Maxmind GeoLite2 databases (so you can use the geographical location in your region map).

Elasticsearch Machine Learning: U.S. / U.K. Foreign Exchange Rate

At the beginning of May 2017 Elastic announced the first release of machine learning features for the Elastic Stack, available via X-Pack.

The machine learning features of X-Pack (Platinum/Enterprise subscription) are focused on providing Time Series Anomaly Detection capabilities using unsupervised machine learning.

In this post we are going to see an example of time series anomaly detection using the machine learning features of Elasticsearch.

To use this features you need at least the version 5.4.0 of Elasticsearch, Kibana and X-Pack.
In this post I am not going to show how to install the stack components. I used the following:

  • Elasticsearch 5.4.1
  • Kibana 5.4.1
  • X-Pack 5.4.1 (installed both in ES and Kibana)

Here you can find the installation steps:

The machine learning feature is enabled by default on each node, here you can find more details about further configurations: Machine Learning Settings

We are going to use the following dataset: U.S. / U.K. Foreign Exchange Rate.
It represents the daily foreign exchange rate between U.S. Dollar and U.K. Pound between April 1971 and beginning of June 2017.

This is a sample of the data:

We will index the documents (around 16k) in a time-based index called foreignexchangerate-YYYY (where YYYY represents the year of the document).
The time-based index is necessary to use the machine learning feature. The Configured time field of the index will be used as time-aggregation by the feature.
I did not find a way (AFAIK) to use a not time-based index and select a date field while creating a machine learning job.

This is how each time-based index looks like:

Once we indexed our documents, and once we added the index pattern to Kibana, we can create our first machine learning job.


To create a new Job, select the Machine Learning section from the left menu of Kibana (if you do not see it, maybe you have the wrong Kibana version or you did not install X-Pack into Kibana).

You can now choose between Single Metric or Multi metric job, we will choose Single Metric job (for the foreignexchangerate-* index pattern).

We will use the whole time series and a 3 days rolling exchange_rate average. The idea is to aggregate the series by 3 days, compute the average of the exchange rate and spot anomalies.


One we configure the job, we can create it. The machine learning model will be build using our time series and the aggregation/metric we specified.


We can now inspect the anomalies detected using the Anomaly Explorer or the Single Metric View, both from the ML Jobs dashboard.


I checked some of the anomalies automatically identified and almost all of them make sense (I found drop in the exchange rate due to events like Brexit or EU Crisis).

So far we see all the analysis inside Kibana but the machine learning feature comes also with a set of APIs, so you can integrate the time-series anomaly detection with your application.
Here you can find the details about the APIs: ML APis.

In this post we saw a simple example of how to create and run a machine learning job inside Elasticsearch. There are a lot of other aspects like the multi-metric and advanced-metric that I think are important.

The machine learning features are pretty new and I think (and hope!) that Elastic will invest a lot of resources to improve and extend it.

I am going to run some other tests on the ML features and I would like to run some anomaly detection algorithms (statistical and ML based) on the same dataset to benchmark and compare the Elasticsearch results, if you want to collaborate and help me (or if you have some knowledge/background about time series anomaly detection) drop me a line 🙂 .

Kibana Tag Cloud

In the Kibana 5.1.1 version, a new type of visualization has been added: the Tag Cloud chart.
A tag cloud visualization is a visual representation of text data, typically used to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.
In this post we are going to see how to use this new type of visualization. I assume you already have installed and configured Kibana and Elasticsearch.

First of all, create a new index and index some documents. I indexed a JSON file containing the entire works of Shakespeare.

Each document has the following format.

You can download it here (notice it is around 24 MB): shakespeare.json.

Create a new index.

And index the documents using the Bulk Index API.

Now, from the Kibana dashboard, select the tag cloud visualization chart.



You only need to specify the field to use to build the tag cloud. Notice that the tag cloud only supports the term aggregation.

In this example I selected the speaker field. So the tag cloud will depict the main (higher count) speakers within the Shakespeare works.
You can select a bunch of other options like the tags font size and orientations.

The main speakers within the the works of Shakespeare are Gloucester and Hamlet.

You can save this visualization and add it to your dashboard.

The tag cloud visualization is a useful visual representation of text data, that can be used to depict keyword metadata (tags) of documents in a Elasticsearch index.

Elasticsearch and Kibana with Docker

Last weekend, in occasion of the Docker Global Mentor Week, I attended the Docker meetup in Milan. I improved my knowledge about the containers world so I decide to use Docker and Docker-Compose to ship Elasticsearch and Kibana. I already wrote some posts about Docker, you can find them here: Docker and Docker Compose and Docker Compose and Django.

I suppose you already have a basic knowledge about the main Docker commands (run, pull, etc.).

I have been using Docker version 1.12.3 and Docker-compose 1.8.1 (be sure you docker-compose version supports the version 2 of docker-compose file)
We can directly pull the images for Elasticseach and Kibana (I am using the latest version 5.0.1):

The Elasticsearch image is based on the openjdk:8-jre image, you can find the Dockerfile here: Elasticseatch 5.0.1 Dockerfile.
The Kibana image is based on the debian:jessie image, you can find the Dockerfile here: Kibana 5.0.1 Dockerfile

I defined a docker-compose.yml file to ship two containers with the previously pulled images, I exposed the default ports, 9200 for Elasticsearch and 5601 for Kibana. The environment variable defined within the Kibana service, represents the Elastichsearch url (within Docker you just need to specify the service name, it will automatically resolve it to an IP address).

With the docker-compose version 2 you do not have to specify the linking between the services, but they will be automatically placed within the same network (beside you specify a custom one).

The last version of Elasticsearch is more strict about the bootstrap checks so be sure to correctly set the vm.max_map_count and the file descriptors number (Wiki: file descriptor)

You can read more about these bootstrap checks here: Bootstrap Checks

We can now ship the two containers using docker-compose up command.

The two containers have been shipped and are running, we can reach Kibana at http://localhost:5601 and Elasticsearch at http://localhost:9200.


So with Docker and docker-compose we can easily run Elasticseach and Kibana, focusing more on the application development instead of the environment installation.