Extract business insights from audio using AWS Transcribe, AWS Comprehend and Elasticsearch – Part 1 and 2- Skedler Blog

I’ve just published a new blog post on the Skedler Blog.
In this two-part blog post, we are going to present a system architecture to convert audio and voice into written text with AWS Transcribe, extract useful information for quick understanding of content with AWS Comprehend, index this information in Elasticsearch 6.2 for fast search and visualize the data with Kibana 6.2. In Part I, you can learn about the key components, architecture, and common use cases. In Part II, you can learn how to implement this architecture.

The components that we are going to use are the following:

  • AWS S3 bucket
  • AWS Transcribe
  • AWS Comprehend
  • Elasticsearch 6.2
  • Kibana 6.2
  • Skedler Reports and Alerts

System architecture:

You can read the full post – Part 1 – here: Extract business insights from audio using AWS Transcribe, AWS Comprehend and Elasticsearch – Part 1.

Part 2 – here: Extract business insights from audio using AWS Transcribe, AWS Comprehend and Elasticsearch – Part 1.

Please share the post and let me know your feedbacks.

Application Performance Monitoring (APM) with Elasticsearch 6.1.1

In June 2017 Elastic joined forces with Opbeat an application performance monitoring (APM) company. Read the official blog post here: Welcome Opbeat to the Elastic Family.

Adding APM (Application Performance Monitoring) to the Elastic Stack is a natural next step in providing our users with end-to-end monitoring, from logging, to server-level metrics, to application-level metrics, all the way to the end-user experience in the browser or client.

Elastic APM consists of three components:

  • Agents: libraries that run inside of your application process and automatically measure the duration of requests to your service and things like database queries, cache calls, external http requests and errors
  • The APM server (written in Golang) that processes data from agents and stores the data in Elasticsearch
  • Kibana UI: dashboards that gives you an instant overview of application response times, requests per minutes, error occurrences and more.

The APM server and the agents (right now available only for Python and NodeJS) are open source:

Read more about it here: Starting Down the Path of APM for the Elastic Stack

In this post we are not going to see how to install the APM server, you can find the instructions here: Open Source Application Performance Monitoring.

Once the APM Server is installed and started we can monitor the performance of our application. In this example we will see a Python Flask application.

Install the Python APM library:

Initialize the client:

Within the Flask route you can log some messages:

or exceptions:

Here is how the monitoring looks like in Kibana:

You can see the details of each request by clicking on it:

 

I really like the APM feature fully integrated with the Elastic Stack. I will integrate it within my Flask/Django applications.
If you want to read more about the new APM feature:

If you want to read more about this topic: Application Performance Monitoring with Elasticsearch 6.1, Kibana and Skedler Alerts.

Real-time Tweets geolocation visualization with Elasticsearch and Kibana region map

In Kibana version 5.5 a new type of chart has been added: Region Map.
Region maps are thematic maps in which boundary vector shapes are colored using a gradient: higher intensity colors indicate larger values, and lower intensity colors indicate smaller values. These are also known as choropleth maps.

In this post we are going to see how to use the Region Map to visualize the geolocation detail of a stream of Tweets (consumed using the Twitter streaming API). Basically we will show the location (by country) of a stream of Tweets on the map (higher intensity colors indicate larger volume of Tweets).

Here you can read more about the Region Map:

I am using Elasticsearch and Kibana version 5.5 on Ubuntu 14.04 and Python 3.4.

We are going to use the Twitter streaming API to consume the public data stream flowing through Twitter (set some hashtags/keywords to filter the tweets). Given the latitude and longitude (GEOJson format) of each tweet (when available) we are going to use the Google Maps API (Geocoding) to get the country name (or code) from the latitude and longitude.
Once we identified the country (given the latitude and longitude), we are going to index the Tweet to Elasticsearch and then visualize its location using the Kibana Region Map.
For each Tweet we are interested to the country (that represents the geographic location of the Tweet as reported by the user or client application), the text (for further query) and the creation date (to filter our result).

First of all, define a new Elasticsearch mapping called tweet, within the index tweetrepository:

Notice that the country field is a keyword field type (A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags). It will be used as join (between the map and the term aggregation) field for the Region Map visualization.

Using Python tweepy we are going to read the public stream of Tweets.

For each tweet we are going to use the Google API to identify the country from the GEOJson details. Once we identified the country we index the document to Elasticsearch.

This is how an indexed document looks like.

tweet_region_map_document

We are going now to create a new region map visualization.

new_region_map

In the option section of the visualization, select the Vector Map. This is the map layer that will be used. This list includes the maps that are hosted by the Elastic Maps Service as well as your self-hosted layers that are configured in the config/kibana.yml file. To learn more about how to configure Kibana to make self-hosted layers available, see the region map settings documentation.

We will use the World Country vector map. The join field is the property from the selected vector map that will be used to join on the terms in your terms-aggregation. In this example the join field is the country name (so we can match the regions of the map with our documents).

In the style section you can choose the color schema (red to green, shades of blue/green, heatmap) that will be used.
region_map_configuration

 

In the buckets section select the country field (field of our mapping). The values of this field will be used as lookup (join) on the vector map.

region_map_configuration1

 

This is how our region map looks like. The darker countries are the one with a higher number of Tweets.

region_map
I really like this new type of visualization, it easy to use and allows you to add nice visualization map (even with self-hosted layers that are configured in the config/kibana.yml file) to your Kibana dashboards.

If you use Kibana to visualize logs and if you use Logstash take a look at this plugin: GeoIP Filter. The GeoIP filter adds information about the geographical location of IP addresses, based on data from the Maxmind GeoLite2 databases (so you can use the geographical location in your region map).

Elasticsearch Machine Learning: U.S. / U.K. Foreign Exchange Rate

At the beginning of May 2017 Elastic announced the first release of machine learning features for the Elastic Stack, available via X-Pack.

The machine learning features of X-Pack (Platinum/Enterprise subscription) are focused on providing Time Series Anomaly Detection capabilities using unsupervised machine learning.

In this post we are going to see an example of time series anomaly detection using the machine learning features of Elasticsearch.

To use this features you need at least the version 5.4.0 of Elasticsearch, Kibana and X-Pack.
In this post I am not going to show how to install the stack components. I used the following:

  • Elasticsearch 5.4.1
  • Kibana 5.4.1
  • X-Pack 5.4.1 (installed both in ES and Kibana)

Here you can find the installation steps:

The machine learning feature is enabled by default on each node, here you can find more details about further configurations: Machine Learning Settings

We are going to use the following dataset: U.S. / U.K. Foreign Exchange Rate.
It represents the daily foreign exchange rate between U.S. Dollar and U.K. Pound between April 1971 and beginning of June 2017.

This is a sample of the data:

We will index the documents (around 16k) in a time-based index called foreignexchangerate-YYYY (where YYYY represents the year of the document).
The time-based index is necessary to use the machine learning feature. The Configured time field of the index will be used as time-aggregation by the feature.
I did not find a way (AFAIK) to use a not time-based index and select a date field while creating a machine learning job.

This is how each time-based index looks like:

Once we indexed our documents, and once we added the index pattern to Kibana, we can create our first machine learning job.

exchange_rate_index

To create a new Job, select the Machine Learning section from the left menu of Kibana (if you do not see it, maybe you have the wrong Kibana version or you did not install X-Pack into Kibana).

You can now choose between Single Metric or Multi metric job, we will choose Single Metric job (for the foreignexchangerate-* index pattern).

We will use the whole time series and a 3 days rolling exchange_rate average. The idea is to aggregate the series by 3 days, compute the average of the exchange rate and spot anomalies.

kibana_ml

One we configure the job, we can create it. The machine learning model will be build using our time series and the aggregation/metric we specified.

kibana_ml_1

We can now inspect the anomalies detected using the Anomaly Explorer or the Single Metric View, both from the ML Jobs dashboard.

ml_anomalies

I checked some of the anomalies automatically identified and almost all of them make sense (I found drop in the exchange rate due to events like Brexit or EU Crisis).

So far we see all the analysis inside Kibana but the machine learning feature comes also with a set of APIs, so you can integrate the time-series anomaly detection with your application.
Here you can find the details about the APIs: ML APis.

In this post we saw a simple example of how to create and run a machine learning job inside Elasticsearch. There are a lot of other aspects like the multi-metric and advanced-metric that I think are important.

The machine learning features are pretty new and I think (and hope!) that Elastic will invest a lot of resources to improve and extend it.

I am going to run some other tests on the ML features and I would like to run some anomaly detection algorithms (statistical and ML based) on the same dataset to benchmark and compare the Elasticsearch results, if you want to collaborate and help me (or if you have some knowledge/background about time series anomaly detection) drop me a line 🙂 .

Skedler: PDF, XLS Report Scheduler Solution for Kibana

With Kibana you can create intuitive charts and dashboards. Since Aug 2016 you can export your dashboards in a PDF format thanks to Reporting. With Elastic version 5 Reporting has been integrated in X-Pack for the Gold and Platinum subscriptions.
Recently I tried Skedler, an easy to use report scheduling and distribution application for Kibana that allows you to centrally schedule and distribute Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders.
Skedler is a standalone app that allows you to utilize a new dashboard where you can manage Kibana reporting tasks (schedule, dashboards and saved search). Right now there are four different price plans (from free to premium edition).
Here you can find some useful resources about Skedler:

In this post I am going to show you how to install Skedler (on Ubuntu) and how export/schedule a Kibana dashboard.
First, download the installer from the official website and untar the archive.

Edit the install.sh file and set the value of the JAVA_HOME variable (check your current variable using echo $JAVA_HOME):

java_home_skedler

Install Skedler:

Once Skedler is installed, edit the config/reporting.yml file. You have to set the Elastcsearch and Kibana URLs and eventually authentication and proxy details.
You can now run Skedler as a service:

or manually with

Now Skedler is running on the port 3000:

skedler_running
If you want read more about how to install Skedler:

From the Skedler dashboard we can now schedule a report.
The steps to schedule a new report are the following:

Report Details

Create a new schedule, select the Kibana dashboard or saved query and the output format. In the example I selected a dashboard called “My dashboard” (that I previously create in Kibana) and PDF format.

skedler_schedule_report

Layout Details

Select the font-family, page size and company logo.

skedler_layout_details

 

Schedule Details

Define a schedule frequency and a time window for the data.

skedule_schedule_details

 

Once you finished the configuration, you fill find the new schedule in the Skedler dashboard. You can set a list of email addresses to which the report will be sent.

skedler_dashboard

If you want to see how your exported dashboard will look like, you can Preview it. This is how my dashboard look like (note that it is a PDF file).

skedler_out_pdf

In this post I demonstrated how to install and configure Skedler and how create a simple schedule for our Kibana dashboard. My overall impression of Skedler is that it is a powerful application to use side-by-side with Kibana that helps you to deliver your contents directly to your stakeholders.

These are main benefits that Skedler offers:

  • It’s easy to install
  • Linux and Windows support (it runs on Node.js server)
  • Reports generated locally (your data are not sent to cloud or Skedler servers)
  • Competitive price plans
  • Support to Kibana 4 and 5 releases.
  • Automatically discovery your existing Kibana Dashboards and Saved Searches (so you can easily use Skedler in any environment, no new stack installation needed)
  • It let you centrally schedule and manage who gets which reports and when they get them
  • Allows for hourly, weekly, monthly, and yearly schedules
  • Generates XLS and PNG reports besides PDF  as opposed to Elastic Reporting that only supports PDF

I strongly recommend that you try Skedler (there is a free plan and 21-days trial) because it can help you to automatically deliver reports to your stakeholders and it integrates with your ELK environment without any modification to your stack.

Here you can find some more useful resource form the official website: