Elasticsearch X-Pack Graph

There are many potential relationships living among the documents stored in your Elastic indexes. Thanks to Graph you can explore them and perform graph analysis on your documents.
Graph is a product released in the middle of 2016 as Kibana plugin and now with Elastic version 5.0 is included in the X-Pack extension (Platinum subscription).
Graph is an API- and UI-driven tool, so you can integrate the Graph capabilities into your applications or explore the data using the UI.

Examples of useful information deduced by a graph analysis are the following:

  • Discover which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
  • Suggest the next best song for a listener who digs Mozart based on their preferences to and keep them engaged and happy
  • Identify potential bad actors and other unexpected associates by looking at external IPs that machines on your network are talking to

You can install the X-Pack into Elasticsearch using the command:

and into Kibana using the command:

Here you can find useful resources about Graph capabilities and subscriptions:

In this post I am going to show you an example of how Graph works. We will perform a set of analysis on a dataset that lists all current City of Chicago employees, complete with full names, departments, positions, and annual salaries.
You can read more about the dataset and download it here: City of Chicago employees
These are the metadata of the dataset:

  • Name: name of the employee
  • Surname: surname of the employee
  • Department: the department where he works
  • Position: the job position
  • Annual Salary: annual salary in dollars
  • Income class*: the income class based on the annual salary
  • Sex*: male or female

I computed the field marked with the *, you will not find them in the original dataset.
This is how the CSV file looks like:

dataset_sample

I converted the CSV file to a JSON file (using a Python script) to easily index the JSON documents to Elasticsearch using the bulk index API.
The JSON file I produced looks like this (the name of my index is chicagoemployees and the type is employee:

dataset_json_sample

Once you have the JSON file you can index you documents in bulk:

Now that we have installed X-Pack and that we indexed our document we can start to explore them using Graph.

kibana_graph

Here few examples of analysis performed on the data:
Which departments have the lowest annual income?

department_income_example

And which have the highest?

department_income_example_2

 

We can see that who is working in the Law, Health or Fire departments have a higher annual salary than who is working in the Public Library or City Clerk departments.
The thicker edges represent stronger relation (more related documents).

We want now to highlight the relationships between Departments and Positions for the female employees that work in the Police department.

department_position
We can see that the main relationship is between the Police department and the Police Office position but also that the Clerk III position is shared among a lot of departments.

The last example shows the relationships between the gender and the departments, showing that some departments are common between male and female while others are not.

sex_department

In this post we saw how to import some documents into Elasticsearch and exploit the Graph tool to discover relationships among our documents. Graph is a really powerful tool because it helps you to find out what is relevant in you data (it is not an easy task because popular is not always the same as relevant).

I suggest you to try Graph, it can easily run in your Kibana instance and analyze your existing indexes (you do not need any pre-processing on your documents).

Here you can download all the resources used in the demo:

Elasticsearch Graph

As I announced few weeks ago in a previous post, the Elastic team released a new product called Graph (see the old post for the details about the product).

In this post I am going to test the Graph capabilities. I assume that you have Elasticsearchand Kibana up and running. For the demo I used Elasticsearch 2.3.3 (released on May 18, 2016) and Kibana 4.5.1 (released on May 18, 2016).

To install Graph you need to run the following commands on your Elasticsearch machine (eventually on every machine on the cluster). Be sure that Elasticsearch and Kibana are not running.

Install Graph into Elasticsearch:

Install Graph into Kibana:

Start Elasticsearch and Kibana:

When you start Kibana if everything is correctly installed, you will see in the Log that Graph (plugin:graph) has been stared.

startKibanaGraphLog

Now navigate to the Graph panel:

For this demo I generated some JSON test data using the site Mockaroo (check it out, it is a really cool website to generate test data for your application in different format).
The dataset has the following fields and there are 1000 data instances:

  • First name
  • Last name
  • Country
  • City
  • Favourite color
  • Skill (representing the best skill of each individual)
  • Most used drug

This is an example of test data instance:

I saved all the instances in a file called MOCK_DATA.json, created an index on Elasticsearch and post the data instances using cURL (see this link for the details about the Bulk Api).

Create the index with the mapping:

Post the data with the Bulk API:

Now from the Graph panel is possible to select the previously selected index and the fields:

selectIndex

Now I am going to show some examples of analysis made using Graph.

I selected the fields Country, City, Skill and Most Used drug and filter for Country = France. A possible question for this type of analysis could be “Which kind of drug are most common in France and in which cities? Is there a relation between drug usage and skill?”

This is the graph build and as we can see we have all the connections between drug type and cities and skill. The thicker edges represent stronger relation (simply more related data instances).
graph1

For the second example I selected the country and the skill fields. This type of analysis help us to answer to this question: “Which are the most common skills among the countries? Are there common skills between countries?”

graph2

As we can see from the built graph, there are common skills between people living in Germany and France. Poland is the country who shares more skills with France but does not share skills with Germany.

The last example shows the relation between favourite color, drug usage and city of residence.

graph3

As said before thicker edges represent stronger relation, by clicking on one edge, it is possible to see (in the Link Section) how many instances share the same attributes, how many has the first attribute involved in the relation and how many has the second one involved.

linkClick

The Graph panel allows to filter the data shown in the graph by using the same syntax of Kibana queries and allows to select the number of samples to use to build the chart and other bunk of options to handle the graph sensitivity.

The purpose of this post is to share the features of the Graph plugin. It looks powerful, hope the Elastic team will expand it with new capabilities in the future.

Graph

Few weeks ago the Elastic team released a new product called Graph.

Graph is a product running on top of the ElasticSearch cluster (inside Kibana) that allows you to run graph analysis again your data. The data often contains references or properties that represent connections, link, between objects, with Graph you can now focus on these relationships.

Graph has been officially presented at the Elastic{ON} in San Francisco in February.

Here is a simple picture of how Graph looks like:
graph

 

 

I will explore the feature of Graph in the next post.
Here you can find the official post on the Elastic blog.

Matteo