There are many potential relationships living among the documents stored in your Elastic indexes. Thanks to Graph you can explore them and perform graph analysis on your documents.
Graph is a product released in the middle of 2016 as Kibana plugin and now with Elastic version 5.0 is included in the X-Pack extension (Platinum subscription).
Graph is an API- and UI-driven tool, so you can integrate the Graph capabilities into your applications or explore the data using the UI.
Examples of useful information deduced by a graph analysis are the following:
- Discover which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
- Suggest the next best song for a listener who digs Mozart based on their preferences to and keep them engaged and happy
- Identify potential bad actors and other unexpected associates by looking at external IPs that machines on your network are talking to
You can install the X-Pack into Elasticsearch using the command:
bin/elasticsearch-plugin install x-pack
and into Kibana using the command:
bin/kibana-plugin install x-pack
Here you can find useful resources about Graph capabilities and subscriptions:
In this post I am going to show you an example of how Graph works. We will perform a set of analysis on a dataset that lists all current City of Chicago employees, complete with full names, departments, positions, and annual salaries.
You can read more about the dataset and download it here: City of Chicago employees
These are the metadata of the dataset:
- Name: name of the employee
- Surname: surname of the employee
- Department: the department where he works
- Position: the job position
- Annual Salary: annual salary in dollars
- Income class*: the income class based on the annual salary
- Sex*: male or female
I computed the field marked with the *, you will not find them in the original dataset.
This is how the CSV file looks like:
I converted the CSV file to a JSON file (using a Python script) to easily index the JSON documents to Elasticsearch using the bulk index API.
The JSON file I produced looks like this (the name of my index is chicagoemployees and the type is employee:
Once you have the JSON file you can index you documents in bulk:
curl -XPUT elasticEndPoint:9200/_bulk --data-binary @chicagoemployees.json
Now that we have installed X-Pack and that we indexed our document we can start to explore them using Graph.
Here few examples of analysis performed on the data:
Which departments have the lowest annual income?
And which have the highest?
We can see that who is working in the Law, Health or Fire departments have a higher annual salary than who is working in the Public Library or City Clerk departments.
The thicker edges represent stronger relation (more related documents).
We want now to highlight the relationships between Departments and Positions for the female employees that work in the Police department.
We can see that the main relationship is between the Police department and the Police Office position but also that the Clerk III position is shared among a lot of departments.
The last example shows the relationships between the gender and the departments, showing that some departments are common between male and female while others are not.
In this post we saw how to import some documents into Elasticsearch and exploit the Graph tool to discover relationships among our documents. Graph is a really powerful tool because it helps you to find out what is relevant in you data (it is not an easy task because popular is not always the same as relevant).
I suggest you to try Graph, it can easily run in your Kibana instance and analyze your existing indexes (you do not need any pre-processing on your documents).
Here you can download all the resources used in the demo: