Elastic Stack in A Day – Milano 2017

Elastic Stack In A Day 2017 is the third edition of the Italian event dedicated to the Elastic technologies.
The event is organized by Seacom with the collaboration of Elastic and it will take place on June 6 2017 @Milano, Hotel Michelangelo.

esinaday_cover

During the event the news of Elasticsearch and X-Pack will be presented and in the afternoon there will be technical speeches held by developer and engineer (some of them from Elastic).

I will be speaking about Machine learning with TensorFlow and Elasticsearch: Image Recognition.

speech_esinaday

Here the agenda of my speech:

  • What is Image recognition? (few examples, focus on machine learning techniques)
  • Tensorflow (what is it? How to use it for image recognition? Alternatives?)
  • Case history:
    • Description
    • Architecture: which components have been used
    • Implementation (Tensorflow + Elasticsearch)
  • Demo

Here you can find the agenda of the event: Seacom – Elastic Stack in a day 2017 and here you can find the (free) ticket: Eventbrite – Elastic Stack in a day 2017

Hope to see you there 🙂

Here you can find the slides of the speech: Tensorflow and Elasticsearch

Machine learning with Tensorflow and Elasticsearch

In this post we are going to see how to build a machine learning system to perform the image recognition task. The image recognition is the process of identifying and detecting an object or a feature in a digital image or video. The tools that we will use are the following:

  • Amazon S3 bucket
  • Amazon Simple Queue Service
  • Google TensorFlow machine learning library
  • Elasticsearch

The idea is to build a system that will process the image recognition task against some images stored in a S3 bucket and will index the results to Elasticsearch.
The library used for the image recognition task is TensorFlow.
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. You can read more about it here.

These are the main steps performed in the process:

  • Upload image to S3 bucket
  • Event notification from S3 to a SQS queue
  • Event consumed by a consumer
  • Image recognition on the image by TensorFlow
  • The result of the classification is indexed in Elasticsearch
  • Search in Elasticsearch by tags

This image shows the main steps of the process:

flowimgjpg

 

Event notifications

When an image is uploaded to the S3 bucket a message will be stored to a Amazon SQS queue. To configure the S3 Bucket and to read the queue programmatically you can read my previous post:
Amazon S3 event notifications to SQS

Consume messages from Amazon SQS queue

Now that the S3 bucket is configured, when an image is uploaded to the bucket an event will be notified and stored to the SQS queue. We are going to build a consumer to read this notification, download the image from the S3 bucket and perform the image classification using Tensorflow.

With this code you can read the messages from a SQS queue and download the image from the S3 bucket and store it locally (ready for the image classification task):

Image recognition task

Now that the image (originally uploaded to S3) has been downloaded we can use Tensorflow to run the image recognition task.
The model used by Tensorflow for the image recognition task is the Inception-V3. It achieved a 3.46% error rate in the ImageNet competition. You can read more about it here: Inception-V3 and here: Tensorflow image recognition.

I used the Tensorflow Python API, you can install it using Pip:

You can find all the information about Setup and Install here: Download and Setup Tensorflow.Here you can find an official code lab by Google:  Tensorflow for poets.

So, starting from the classify_image.py code (you can find it on Github: classify_image.py) I created a Python module that given the local path of an image (the one previously downloaded from S3) returns a dictionary with the result of the classification.
The result of the classification consists of a set of tags (the objects recognized in the image) and scores (the score represents the probability of a correct classification. The scores sum to one).

So, calling the function run_image_recognition with the image path as argument, will return a dictionary with the result of the classification.

In the previously shown code, the Tensorflow built-in functions definition are not reported (you can find them in the Github repository I linked).
The first time you will run the image classification task, the model (Inception-V3) will be downloaded and stored to your file system (it is around 300MB)

Index to Elasticsearch

So given an image we have now a set of tags that classify our image. We want now to index these tags to Elasticsearch. To do that I created a new index called imagerepository and a new type called image.

The image type we are going to create will have the following properties:

  • title: the title of the image
  • s3_location: the link to the S3 resource
  • tags: field that will contain the result of the classification task

For the tags property I used the Nested datatype. It allows arrays of objects to be indexed and queried independently of each other.
You can read more about it here:
Nested datatype
Nested query

We will not store the image to Elasticsearch but just the URL of the image within the S3 bucket.

New Index:

New Type:

You can now try to post a test document:

We can index a new document using the Elasitcsearch Python SDK.

Search

Now that we indexed our documents in Elasticsearch we can search for them.
This is an example of queries we can run:

  • Give me all the images that represent this object (searching by tag = object_name)
  • What does this image (give the title) represent?
  • Give me all the images that represent this object with at least 90% of probability (search by tag = object_name and score >= 0.9)

I wrote some Sense queries.

Images that represent a waterfall:

Images that represent a pizza with at least 90% of probability:

In this post we have seen how to combine the powerful machine learning library Tensorflow to perform a image recognition task and the search power of Elasticsearch to index the image classification results. The process pipeline includes also a S3 bucket (where the images are stored) and a SQS Queue used to receive event notifications when a new image is stored to S3 (and it is ready for the image classification task).

I ran this demo using the following environment configuration:

  • Elasticsearch 5.0.0
  • Python 3.4
  • tensorflow-0.11.0rc2
  • Ubuntu 14.04