Real-time Tweets geolocation visualization with Elasticsearch and Kibana region map

In Kibana version 5.5 a new type of chart has been added: Region Map.
Region maps are thematic maps in which boundary vector shapes are colored using a gradient: higher intensity colors indicate larger values, and lower intensity colors indicate smaller values. These are also known as choropleth maps.

In this post we are going to see how to use the Region Map to visualize the geolocation detail of a stream of Tweets (consumed using the Twitter streaming API). Basically we will show the location (by country) of a stream of Tweets on the map (higher intensity colors indicate larger volume of Tweets).

Here you can read more about the Region Map:

I am using Elasticsearch and Kibana version 5.5 on Ubuntu 14.04 and Python 3.4.

We are going to use the Twitter streaming API to consume the public data stream flowing through Twitter (set some hashtags/keywords to filter the tweets). Given the latitude and longitude (GEOJson format) of each tweet (when available) we are going to use the Google Maps API (Geocoding) to get the country name (or code) from the latitude and longitude.
Once we identified the country (given the latitude and longitude), we are going to index the Tweet to Elasticsearch and then visualize its location using the Kibana Region Map.
For each Tweet we are interested to the country (that represents the geographic location of the Tweet as reported by the user or client application), the text (for further query) and the creation date (to filter our result).

First of all, define a new Elasticsearch mapping called tweet, within the index tweetrepository:

Notice that the country field is a keyword field type (A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags). It will be used as join (between the map and the term aggregation) field for the Region Map visualization.

Using Python tweepy we are going to read the public stream of Tweets.

For each tweet we are going to use the Google API to identify the country from the GEOJson details. Once we identified the country we index the document to Elasticsearch.

This is how an indexed document looks like.

tweet_region_map_document

We are going now to create a new region map visualization.

new_region_map

In the option section of the visualization, select the Vector Map. This is the map layer that will be used. This list includes the maps that are hosted by the Elastic Maps Service as well as your self-hosted layers that are configured in the config/kibana.yml file. To learn more about how to configure Kibana to make self-hosted layers available, see the region map settings documentation.

We will use the World Country vector map. The join field is the property from the selected vector map that will be used to join on the terms in your terms-aggregation. In this example the join field is the country name (so we can match the regions of the map with our documents).

In the style section you can choose the color schema (red to green, shades of blue/green, heatmap) that will be used.
region_map_configuration

 

In the buckets section select the country field (field of our mapping). The values of this field will be used as lookup (join) on the vector map.

region_map_configuration1

 

This is how our region map looks like. The darker countries are the one with a higher number of Tweets.

region_map
I really like this new type of visualization, it easy to use and allows you to add nice visualization map (even with self-hosted layers that are configured in the config/kibana.yml file) to your Kibana dashboards.

If you use Kibana to visualize logs and if you use Logstash take a look at this plugin: GeoIP Filter. The GeoIP filter adds information about the geographical location of IP addresses, based on data from the Maxmind GeoLite2 databases (so you can use the geographical location in your region map).

Elastic Stack in A Day – Milano 2017

Elastic Stack In A Day 2017 is the third edition of the Italian event dedicated to the Elastic technologies.
The event is organized by Seacom with the collaboration of Elastic and it will take place on June 6 2017 @Milano, Hotel Michelangelo.

esinaday_cover

During the event the news of Elasticsearch and X-Pack will be presented and in the afternoon there will be technical speeches held by developer and engineer (some of them from Elastic).

I will be speaking about Machine learning with TensorFlow and Elasticsearch: Image Recognition.

speech_esinaday

Here the agenda of my speech:

  • What is Image recognition? (few examples, focus on machine learning techniques)
  • Tensorflow (what is it? How to use it for image recognition? Alternatives?)
  • Case history:
    • Description
    • Architecture: which components have been used
    • Implementation (Tensorflow + Elasticsearch)
  • Demo

Here you can find the agenda of the event: Seacom – Elastic Stack in a day 2017 and here you can find the (free) ticket: Eventbrite – Elastic Stack in a day 2017

Hope to see you there 🙂

Here you can find the slides of the speech: Tensorflow and Elasticsearch

Skedler: PDF, XLS Report Scheduler Solution for Kibana

With Kibana you can create intuitive charts and dashboards. Since Aug 2016 you can export your dashboards in a PDF format thanks to Reporting. With Elastic version 5 Reporting has been integrated in X-Pack for the Gold and Platinum subscriptions.
Recently I tried Skedler, an easy to use report scheduling and distribution application for Kibana that allows you to centrally schedule and distribute Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders.
Skedler is a standalone app that allows you to utilize a new dashboard where you can manage Kibana reporting tasks (schedule, dashboards and saved search). Right now there are four different price plans (from free to premium edition).
Here you can find some useful resources about Skedler:

In this post I am going to show you how to install Skedler (on Ubuntu) and how export/schedule a Kibana dashboard.
First, download the installer from the official website and untar the archive.

Edit the install.sh file and set the value of the JAVA_HOME variable (check your current variable using echo $JAVA_HOME):

java_home_skedler

Install Skedler:

Once Skedler is installed, edit the config/reporting.yml file. You have to set the Elastcsearch and Kibana URLs and eventually authentication and proxy details.
You can now run Skedler as a service:

or manually with

Now Skedler is running on the port 3000:

skedler_running
If you want read more about how to install Skedler:

From the Skedler dashboard we can now schedule a report.
The steps to schedule a new report are the following:

Report Details

Create a new schedule, select the Kibana dashboard or saved query and the output format. In the example I selected a dashboard called “My dashboard” (that I previously create in Kibana) and PDF format.

skedler_schedule_report

Layout Details

Select the font-family, page size and company logo.

skedler_layout_details

 

Schedule Details

Define a schedule frequency and a time window for the data.

skedule_schedule_details

 

Once you finished the configuration, you fill find the new schedule in the Skedler dashboard. You can set a list of email addresses to which the report will be sent.

skedler_dashboard

If you want to see how your exported dashboard will look like, you can Preview it. This is how my dashboard look like (note that it is a PDF file).

skedler_out_pdf

In this post I demonstrated how to install and configure Skedler and how create a simple schedule for our Kibana dashboard. My overall impression of Skedler is that it is a powerful application to use side-by-side with Kibana that helps you to deliver your contents directly to your stakeholders.

These are main benefits that Skedler offers:

  • It’s easy to install
  • Linux and Windows support (it runs on Node.js server)
  • Reports generated locally (your data are not sent to cloud or Skedler servers)
  • Competitive price plans
  • Support to Kibana 4 and 5 releases.
  • Automatically discovery your existing Kibana Dashboards and Saved Searches (so you can easily use Skedler in any environment, no new stack installation needed)
  • It let you centrally schedule and manage who gets which reports and when they get them
  • Allows for hourly, weekly, monthly, and yearly schedules
  • Generates XLS and PNG reports besides PDF  as opposed to Elastic Reporting that only supports PDF

I strongly recommend that you try Skedler (there is a free plan and 21-days trial) because it can help you to automatically deliver reports to your stakeholders and it integrates with your ELK environment without any modification to your stack.

Here you can find some more useful resource form the official website:

Elasticsearch X-Pack Graph

There are many potential relationships living among the documents stored in your Elastic indexes. Thanks to Graph you can explore them and perform graph analysis on your documents.
Graph is a product released in the middle of 2016 as Kibana plugin and now with Elastic version 5.0 is included in the X-Pack extension (Platinum subscription).
Graph is an API- and UI-driven tool, so you can integrate the Graph capabilities into your applications or explore the data using the UI.

Examples of useful information deduced by a graph analysis are the following:

  • Discover which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
  • Suggest the next best song for a listener who digs Mozart based on their preferences to and keep them engaged and happy
  • Identify potential bad actors and other unexpected associates by looking at external IPs that machines on your network are talking to

You can install the X-Pack into Elasticsearch using the command:

and into Kibana using the command:

Here you can find useful resources about Graph capabilities and subscriptions:

In this post I am going to show you an example of how Graph works. We will perform a set of analysis on a dataset that lists all current City of Chicago employees, complete with full names, departments, positions, and annual salaries.
You can read more about the dataset and download it here: City of Chicago employees
These are the metadata of the dataset:

  • Name: name of the employee
  • Surname: surname of the employee
  • Department: the department where he works
  • Position: the job position
  • Annual Salary: annual salary in dollars
  • Income class*: the income class based on the annual salary
  • Sex*: male or female

I computed the field marked with the *, you will not find them in the original dataset.
This is how the CSV file looks like:

dataset_sample

I converted the CSV file to a JSON file (using a Python script) to easily index the JSON documents to Elasticsearch using the bulk index API.
The JSON file I produced looks like this (the name of my index is chicagoemployees and the type is employee:

dataset_json_sample

Once you have the JSON file you can index you documents in bulk:

Now that we have installed X-Pack and that we indexed our document we can start to explore them using Graph.

kibana_graph

Here few examples of analysis performed on the data:
Which departments have the lowest annual income?

department_income_example

And which have the highest?

department_income_example_2

 

We can see that who is working in the Law, Health or Fire departments have a higher annual salary than who is working in the Public Library or City Clerk departments.
The thicker edges represent stronger relation (more related documents).

We want now to highlight the relationships between Departments and Positions for the female employees that work in the Police department.

department_position
We can see that the main relationship is between the Police department and the Police Office position but also that the Clerk III position is shared among a lot of departments.

The last example shows the relationships between the gender and the departments, showing that some departments are common between male and female while others are not.

sex_department

In this post we saw how to import some documents into Elasticsearch and exploit the Graph tool to discover relationships among our documents. Graph is a really powerful tool because it helps you to find out what is relevant in you data (it is not an easy task because popular is not always the same as relevant).

I suggest you to try Graph, it can easily run in your Kibana instance and analyze your existing indexes (you do not need any pre-processing on your documents).

Here you can download all the resources used in the demo:

Elasticsearch and Kibana with Docker

Last weekend, in occasion of the Docker Global Mentor Week, I attended the Docker meetup in Milan. I improved my knowledge about the containers world so I decide to use Docker and Docker-Compose to ship Elasticsearch and Kibana. I already wrote some posts about Docker, you can find them here: Docker and Docker Compose and Docker Compose and Django.

I suppose you already have a basic knowledge about the main Docker commands (run, pull, etc.).

I have been using Docker version 1.12.3 and Docker-compose 1.8.1 (be sure you docker-compose version supports the version 2 of docker-compose file)
We can directly pull the images for Elasticseach and Kibana (I am using the latest version 5.0.1):

The Elasticsearch image is based on the openjdk:8-jre image, you can find the Dockerfile here: Elasticseatch 5.0.1 Dockerfile.
The Kibana image is based on the debian:jessie image, you can find the Dockerfile here: Kibana 5.0.1 Dockerfile

I defined a docker-compose.yml file to ship two containers with the previously pulled images, I exposed the default ports, 9200 for Elasticsearch and 5601 for Kibana. The environment variable defined within the Kibana service, represents the Elastichsearch url (within Docker you just need to specify the service name, it will automatically resolve it to an IP address).

With the docker-compose version 2 you do not have to specify the linking between the services, but they will be automatically placed within the same network (beside you specify a custom one).

The last version of Elasticsearch is more strict about the bootstrap checks so be sure to correctly set the vm.max_map_count and the file descriptors number (Wiki: file descriptor)

You can read more about these bootstrap checks here: Bootstrap Checks

We can now ship the two containers using docker-compose up command.

The two containers have been shipped and are running, we can reach Kibana at http://localhost:5601 and Elasticsearch at http://localhost:9200.

es_kibana_containers

So with Docker and docker-compose we can easily run Elasticseach and Kibana, focusing more on the application development instead of the environment installation.