AWS Comprehend, Translate and Transcribe

At the re:invent2017 AWS presented a lot of new services (read all the announcements here: re:Invent 2017 Product Announcements). In this post we are going to see three new services related to the language processing.

  • Amazon Comprehend
  • Amazon Translate
  • Amazon Transcribe

These new services are listed within the Machine Learning section.

Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; and automatically organizes a collection of text files by topic.

You can use the Amazon Comprehend APIs to analyze text and use the results in a wide range of applications including voice of customer analysis, intelligent document search, and content personalization for web applications.

The service constantly learns and improves from a variety of information sources, including product descriptions and consumer reviews – one of the largest natural language data sets in the world – to keep pace with the evolution of language.

You can read more about it here: AWS Comprehend and here Amazon Comprehend – Continuously Trained Natural Language Processing.
Watch the video from the AWS re:Invent Launchpad: Amazon Comprehend.

This service is available, here you can find few examples (using Boto3 Python AWS SDK).

Instantiate a new client.

Detect the dominant language in your text.

Detect the entities in your text.

Detect the key phrases in your text.

Get the sentiment in your text.

Amazon Translate

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses machine learning and deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to easily translate large volumes of text efficiently, and to localize websites and applications for international users.

The service is still in preview, watch the launch video here: AWS re:Invent 2017: Introducing Amazon Translate

You can read more about it here: Introducing Amazon Translate – Real-time Language Translation.

Amazon Transcribe

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.

Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content. The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language.

The service is still in preview, watch the launch video here: AWS re:Invent 2017: Introducing Amazon Transcribe
You can read more about it here: Amazon Transcribe – Accurate Speech To Text At Scale.

This is an example of how to use this service (code written by @jrhunt and taken from here).
Note that the API for Transcribe (while in preview) is subject to change (this code may not be the final version of the API):

Output of the speech recognition:

I am looking forward to use these new services, they are easy to use, they easily integrate with the AWS world and they can add powerful features to your applications.

Elasticsearch Machine Learning: U.S. / U.K. Foreign Exchange Rate

At the beginning of May 2017 Elastic announced the first release of machine learning features for the Elastic Stack, available via X-Pack.

The machine learning features of X-Pack (Platinum/Enterprise subscription) are focused on providing Time Series Anomaly Detection capabilities using unsupervised machine learning.

In this post we are going to see an example of time series anomaly detection using the machine learning features of Elasticsearch.

To use this features you need at least the version 5.4.0 of Elasticsearch, Kibana and X-Pack.
In this post I am not going to show how to install the stack components. I used the following:

  • Elasticsearch 5.4.1
  • Kibana 5.4.1
  • X-Pack 5.4.1 (installed both in ES and Kibana)

Here you can find the installation steps:

The machine learning feature is enabled by default on each node, here you can find more details about further configurations: Machine Learning Settings

We are going to use the following dataset: U.S. / U.K. Foreign Exchange Rate.
It represents the daily foreign exchange rate between U.S. Dollar and U.K. Pound between April 1971 and beginning of June 2017.

This is a sample of the data:

We will index the documents (around 16k) in a time-based index called foreignexchangerate-YYYY (where YYYY represents the year of the document).
The time-based index is necessary to use the machine learning feature. The Configured time field of the index will be used as time-aggregation by the feature.
I did not find a way (AFAIK) to use a not time-based index and select a date field while creating a machine learning job.

This is how each time-based index looks like:

Once we indexed our documents, and once we added the index pattern to Kibana, we can create our first machine learning job.


To create a new Job, select the Machine Learning section from the left menu of Kibana (if you do not see it, maybe you have the wrong Kibana version or you did not install X-Pack into Kibana).

You can now choose between Single Metric or Multi metric job, we will choose Single Metric job (for the foreignexchangerate-* index pattern).

We will use the whole time series and a 3 days rolling exchange_rate average. The idea is to aggregate the series by 3 days, compute the average of the exchange rate and spot anomalies.


One we configure the job, we can create it. The machine learning model will be build using our time series and the aggregation/metric we specified.


We can now inspect the anomalies detected using the Anomaly Explorer or the Single Metric View, both from the ML Jobs dashboard.


I checked some of the anomalies automatically identified and almost all of them make sense (I found drop in the exchange rate due to events like Brexit or EU Crisis).

So far we see all the analysis inside Kibana but the machine learning feature comes also with a set of APIs, so you can integrate the time-series anomaly detection with your application.
Here you can find the details about the APIs: ML APis.

In this post we saw a simple example of how to create and run a machine learning job inside Elasticsearch. There are a lot of other aspects like the multi-metric and advanced-metric that I think are important.

The machine learning features are pretty new and I think (and hope!) that Elastic will invest a lot of resources to improve and extend it.

I am going to run some other tests on the ML features and I would like to run some anomaly detection algorithms (statistical and ML based) on the same dataset to benchmark and compare the Elasticsearch results, if you want to collaborate and help me (or if you have some knowledge/background about time series anomaly detection) drop me a line 🙂 .

Twitter sentiment analysis with Amazon Rekognition

During the AWS re:Invent event, a new service for image analysis has been presented: Amazon Rekognition. Amazon Rekognition is a service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You can also search and compare faces. Rekognition’s API enables you to quickly add sophisticated deep learning-based visual search and image classification to your applications.
You can read more here: Amazon Rekognition and here: AWS Blog.

In this post we are going to see how to use the AWS Rekognition service to analyze some Tweets (in particular the media attached to the Tweet with the Twitter Photo Upload feature) and extract sentiment information. This task is also called emotion detection and sentiment analysis of images.
The idea is to track the sentiments (emotions, feelings) of people who are posting Tweets (with media) about a given topic (defined by a set of keywords or hashtags).

Given a set of filters (keywords or hashtags) we will use the Twitter Streaming API to get some real-time Tweets and we will analyze the media (images) within the Tweet.
AWS Rekognition gives us information about the faces in the picture, such has the emotions identified, the gender of the subjects, the presence of a smile and so on.

For this example we are going to use Python3.4 and the official AWS Python SDK Boto3.

First, define a new client connection to the AWS Rekognition service.

I assume you know how to work with the Twitter Streaming API, I reported an example, by the way you can find the code of this project in this Github repository: tweet-sentiment-aws-reko.
Example of Tweet dowload using the Streaming API and Python Tweepy:

For each Tweet downloaded from the stream (we will discard the Tweet without media), we analyze the image and extract the sentiment information.
To detect faces we use the detect_faces method. It takes the image blob (or the link to an image stored within S3) and returns the details about the image.

To check if at least one face has been found within the image, check if the key ‘FaceDetails‘ exists in the response dictionary.
We can now loop through the identified details and store the results.

We can now print the results to understand the sentiments within the analyzed images, in particular:

  • How many smiling faces there were in pictures?
  • How many men/women?
  • Which emotions were detected? With which confidence? (in average)

This is an example of output (analysis performed on a set of 20 images).

We can use this simple system to track and analyze the sentiment within the photos posted on Twitter. Given a set of posted pictures we can understand if the people in the pictures are happy, are male/female and which kind of emotions are feeling (this can be very useful to understand the reputation of a brand/product).

Note that AWS Rekognition gives also information about the presence of beard, mustaches and sunglasses and can compute the similarity between two faces (this can be useful to understand if there are more pictures representing the same people).

I would like to improve the system, would be interesting to analyze also the text of each Tweet to see if is there a correlation (and how strong) between the sentiment of the text and the sentiment of the image. If you want to contribute do not hesitate to contact me (would be cool also to build a front-end to see the downloaded medias and the results in a nicer way).

Boto3 AWS Rekognition official documentation: Rekognition
Github repository with the full code: tweet-sentiment-aws-reko.

Amazon Polly: turns text into lifelike speech

Yesterday at Amazon re:Invent 2016 event a new machine learning service has been presented and made globally available: Polly.
Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.
Right now Polly supports 24 languages across 47 different lifelike voices.

The service allows you to create new content for the users and enables new Internet of Things (IoT) use cases by making it easy and inexpensive to add speech to IoT devices.
You can find a set of use cases here: Amazon Polly

You can interact with Polly using the official Boto 3 Python SDK. In case you have an older version of the SDK, you should update it using pip3 –upgrade.

First, create a new Polly client service instance:

To get the list of the available voices, you can use the describe_voices method. You have to specify the language code (e.g.: en-GB, en-US, it-IT) to get the voices Ids and descriptions.

To synthesize a new text, use the method synthesize_speech. You need to specify the text, the output format and the Voice you want to use (by specifying the Id).

Once we have the synthesized text, we can store the stream to a mp3 file.

Here you can find the official Boto3 documentation about Polly: SDK Boto3 Polly
Official AWS post about Polly: AWS Blog – Polly

I am looking forward to see some cool projects made using this news APIs service.

Machine learning with Tensorflow and Elasticsearch

In this post we are going to see how to build a machine learning system to perform the image recognition task. The image recognition is the process of identifying and detecting an object or a feature in a digital image or video. The tools that we will use are the following:

  • Amazon S3 bucket
  • Amazon Simple Queue Service
  • Google TensorFlow machine learning library
  • Elasticsearch

The idea is to build a system that will process the image recognition task against some images stored in a S3 bucket and will index the results to Elasticsearch.
The library used for the image recognition task is TensorFlow.
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. You can read more about it here.

These are the main steps performed in the process:

  • Upload image to S3 bucket
  • Event notification from S3 to a SQS queue
  • Event consumed by a consumer
  • Image recognition on the image by TensorFlow
  • The result of the classification is indexed in Elasticsearch
  • Search in Elasticsearch by tags

This image shows the main steps of the process:



Event notifications

When an image is uploaded to the S3 bucket a message will be stored to a Amazon SQS queue. To configure the S3 Bucket and to read the queue programmatically you can read my previous post:
Amazon S3 event notifications to SQS

Consume messages from Amazon SQS queue

Now that the S3 bucket is configured, when an image is uploaded to the bucket an event will be notified and stored to the SQS queue. We are going to build a consumer to read this notification, download the image from the S3 bucket and perform the image classification using Tensorflow.

With this code you can read the messages from a SQS queue and download the image from the S3 bucket and store it locally (ready for the image classification task):

Image recognition task

Now that the image (originally uploaded to S3) has been downloaded we can use Tensorflow to run the image recognition task.
The model used by Tensorflow for the image recognition task is the Inception-V3. It achieved a 3.46% error rate in the ImageNet competition. You can read more about it here: Inception-V3 and here: Tensorflow image recognition.

I used the Tensorflow Python API, you can install it using Pip:

You can find all the information about Setup and Install here: Download and Setup Tensorflow.Here you can find an official code lab by Google: Tensorflow for poets.

So, starting from the code (you can find it on Github: I created a Python module that given the local path of an image (the one previously downloaded from S3) returns a dictionary with the result of the classification.
The result of the classification consists of a set of tags (the objects recognized in the image) and scores (the score represents the probability of a correct classification. The scores sum to one).

So, calling the function run_image_recognition with the image path as argument, will return a dictionary with the result of the classification.

In the previously shown code, the Tensorflow built-in functions definition are not reported (you can find them in the Github repository I linked).
The first time you will run the image classification task, the model (Inception-V3) will be downloaded and stored to your file system (it is around 300MB)

Index to Elasticsearch

So given an image we have now a set of tags that classify our image. We want now to index these tags to Elasticsearch. To do that I created a new index called imagerepository and a new type called image.

The image type we are going to create will have the following properties:

  • title: the title of the image
  • s3_location: the link to the S3 resource
  • tags: field that will contain the result of the classification task

For the tags property I used the Nested datatype. It allows arrays of objects to be indexed and queried independently of each other.
You can read more about it here:
Nested datatype
Nested query

We will not store the image to Elasticsearch but just the URL of the image within the S3 bucket.

New Index:

New Type:

You can now try to post a test document:

We can index a new document using the Elasitcsearch Python SDK.


Now that we indexed our documents in Elasticsearch we can search for them.
This is an example of queries we can run:

  • Give me all the images that represent this object (searching by tag = object_name)
  • What does this image (give the title) represent?
  • Give me all the images that represent this object with at least 90% of probability (search by tag = object_name and score >= 0.9)

I wrote some Sense queries.

Images that represent a waterfall:

Images that represent a pizza with at least 90% of probability:

In this post we have seen how to combine the powerful machine learning library Tensorflow to perform a image recognition task and the search power of Elasticsearch to index the image classification results. The process pipeline includes also a S3 bucket (where the images are stored) and a SQS Queue used to receive event notifications when a new image is stored to S3 (and it is ready for the image classification task).

I ran this demo using the following environment configuration:

  • Elasticsearch 5.0.0
  • Python 3.4
  • tensorflow-0.11.0rc2
  • Ubuntu 14.04