Real-time prediction – Amazon Machine Learning

Amazon AWS provides a service called Amazon Machine Learning. It is a service that makes it easy for developers of all skill levels to use machine learning technology.
It provides tools to create machine learning (ML) models without having to learn complex ML algorithms and technology and to obtain predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure.
Here you can find all the details about Amazon Machine Learning.

In this post we are going to see how create a ML model and how make real-time predictions using AWS APIs through Python.
The data set we are going to use for the demo is the Zoo data set (UCI Machine Learning repository: https://archive.ics.uci.edu/ml/datasets/Zoo).
It is a simple data set that represents some animals, it has 17 attributes that represent the features of each animal.
Here the attributes with the relative data type.

The data set has 100 rows, here you can see an example of rows (the last value represents the target value, the predicted one):

The target attribute (the one we are going to predict) is the type attribute. It is a numerical attribute that represents classes of animals.
These are the classes, each numeric value corresponds to a set of animals:

The data set (CSV format) has been uploaded to a S3 bucket and it will be used as input for the ML model.

From the Amazon ML service section we can create a new ML model. We need to specify the url of the input file (from S3) or use a previously added data set. We can set the training and evaluation settings but for this demo we are going to use the default settings (recommended by Amazon).

create_ml

Amazon ML uses the following learning algorithms:

  • For binary classification, Amazon ML uses logistic regression (logistic loss function + SGD).
  • For multiclass classification, Amazon ML uses multinomial logistic regression (multinomial logistic loss + SGD).
  • For regression, Amazon ML uses linear regression (squared loss function + SGD).

You can find more information about the algorithms here.

Once the ML model has been configured, a set of steps will be automatically performed: the data set will be loaded from S3, the model will be built on the data and an evaluation of the model will be performed (by default the evaluation is performed on the 30% of the data set, the remaining 70% is used for the learning task).
creating_ml

 

Once these steps are completed the ML model has been correctly built and evaluated (yes, is is pretty easy!).
You can explore the ML model performance (Amazon uses the root mean square error as indicator) in the evaluation summary section.
ml_performance

 

We can now perform the real-prediction task directly online or through the APIs. To enable the APIs integration, we need to enable a real-time prediction endpoint (you can easily do it from the ML model settings section).
ml_prediction

 

We are going now to write a Python script to predict some data using the previously built ML model. The official Amazon AWS SDK for Python is called Boto3. It allows you to use the AWS service as S3 buckets or Machine Learning (see my previous post about Python and Amazon S3 buckets). Here you can find the official Boto3 documentation.

To create a new client connection to the Amazon ML service using the Boto3 client, we have to specify the service we want to connect to (machinelearning this time), the AWs Keys and region (you can fine the region code here)

We need now to create an object that represents our data instance (the one for which we would like to predict the Type attribute).
We specify the attribute key and its value (according to its data type). These attributes represent an honeybee animal: it has 6 legs, it breathes, it is venomous and makes eggs.

We can now call the predict method. We need to specify the ML Model id and the ML model APIs end point.

The result of the predict method contains the predicted value.
ml_prediction_result

For this example the predicted value is 5.75. The data type of the Type attribute is Number so we can round the predicted value to 6.
So given using these input attributes the model classified our animal (a honeybee) correctly as Type 6 (flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp).

Here you can find the documentation by Amazon about real time prediction.
Here you can find the documentation about prediction using Boto3.