Overview

When I first time heard about Neo4j I thought that such databases are used only for recommendation engines. My perception has changed when I saw presentation about GAAND Stack (GraphQL, Apollo, Angular, Neo4j Database). I noticed during this training that there are many more use cases for this database. Therefore day after training I dug deeper into this technology. I realized that this is really powerful tool because of it’s speed an way of data representation (using graphs we can model real world).

There are two main reasons why I decided to write this post. First one is creating complete manual for GRAND Stack (same as GAAND but with React.js instead of Angular). Second one is structuring knowledge about Neo4j as a preparation for professional certification.

What is graph database?

In simple words graph database is database that uses graph structures to store data. Exactly same as in graph such database has nodes and edges which can be unidirectional and bidirectional.

Structuring data into graphs is intuitive and allows to search data faster than other data structures.

Neo4j database structure

Example Neo4j Graph

What is node?

Node is object that represents entity. Node itself can hold data called properties.

Neo4j Node

What is relation?

Exactly as name says this element represents relation between two nodes. Relationship organize nodes into structures that allows creating lists, maps, trees etc. Relation can hold properties and have to have exactly one relation type.

Neo4j relations

What is relation type?

Type in relation defines which role one node plays for another or explain why two nodes are connected.

Neo4j relation type

What is label?

Labels are used to assign nodes to any groups. One node can have zero or many labels.

We can think about labels like about table names from relational database. Label defines type of node. In below case we have available two types: Person and Movie.

Neo4j labels

What are properties?

Properties are just data that node or relation holds.

Neo4j properties

What is traversal?

Traversal is how you query a graph to get data that You want to get. Traversing a graph means visiting nodes by following defined relationships in any established way that we define by setting rules in query.

What is index?

Index like in other databases allows us to increase performance of querying data. Database create redundant copy of data and store it in most efficient way. Therefore this comes with cost of additional storage space and slower writes.

What is constraint?

Databases are using constraints to prevent storing unwanted data. We define rules, that data should follow, and database checks their values before each commit.

Query Language –  Cypher

Cypher is query language used in Neo4j database. For people who was working with SQL it will look familiar. From my perspective it’s also similar a little bit to streams in Java because writing code in this language creates something like pipe. Reading it from left to right reminds me reading classic sentences.

This query language uses ASCII-Art for patters, that makes this language more readable. After looking into it, we immediately know what is node and what is relation and how we are going to process data.

Basic queries

Fetching data

MATCH (charlie { name: 'Charlie Sheen' })-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director)
RETURN movie.title, director.name

Above tells to database to return movie title and director’s name based on person (as I suppose, because we don’t have it explicitly defined), named Charlie Sheen in this case, who acted in this movie. In simple words we are looking for moves where Charlie Sheen was an actor.

Please note that we are using unidirectional relation by typing “arrow”  (-[:RELATION_TYPE]->). This arrow precisely explain relation between nodes.

Creating node

CREATE (a:Artist { Name : "Strapping Young Lad" })

We can also create multiple node using one command by separating data with commas

CREATE (a:Album { Name: "Killers"}), (b:Album { Name: "Fear of the Dark"}) 
RETURN a,b

or by using separate CREATE statements

CREATE (a:Album { Name: "Piece of Mind"}) 
CREATE (b:Album { Name: "Somewhere in Time"}) 
RETURN a,b

Creating relationship

MATCH (a:Actor),(b:Movie)
WHERE a.Name = "John Tree" AND b.Name = "The neo4j movie"
CREATE (a)-[r:ACTED_IN]->(b)
RETURN r

As You can see, for creating relationships similar keyword is used but we have to point out nodes that should be connected.

Difference in comparison to SQL

In You know SQL You will see many similarities between these query languages. Clauses such as WHERE, UNION, ORDER BY or CREATE exist in both languages. The main difference is that there is no joins since relations are designed in another way than in relational database.

Transactions in Neo4j

Neo4j supports ACID properties to fully maintain data integrity and ensure good transaction behavior.

We have to perform all database operations, that access the graph, indexes, or the schema, in a transaction.

Worth to remember is that:

  • Data retrieved by traversals is not protected from modification by other transactions.
  • Non-repeatable reads may occur (only write locks are acquired and held until the end of the transaction).
  • One can manually acquire locks on nodes and relationships to achieve higher level of isolation
  • Locks are acquired at the Node and Relationship level
  • Deadlock detection is build into the core transaction management

To read more about transactions in Neo4j visit this site.

Isolation Level

Transactions in Neo4j database use a READ_COMMITTED isolation level. It means that transactions won’t see any uncommitted changes from other transactions. Additionally Java API enables explicit locking of nodes and relationships. Locks give the opportunity to simulate the effect of higher levels of isolation by obtaining and releasing locks explicitly.

Examining Neo4j queries

EXPLAIN

Command EXPLAIN allow us to see execution plan without running our statement. To use it we have to prepend our query with EXPLAIN keyword. It will return empty result and won’t make changes to the database state.

You can see example result of this command below

EXPLAIN MATCH p=()-[r:ACTED_IN]->() RETURN p LIMIT 25

PROFILE

To see which operators are doing the most of work we can use PROFILE statement. This commands runs our query and keeps track of how many rows pass through each operator, and how much each operator needs to interact with storage layer to retrieve the necessary data.

Example:

PROFILE MATCH p=()-[r:ACTED_IN]->() RETURN p LIMIT 25

Naming in Neo4j database

Node Label

For naming node labels we use CamelCase starting with upper-case character.

Proper name Incorrect name
VehicleOwner vehicle_owner
NetworkNode networkNode

 

Relationship type name

To name relationship we use uppercase words separated with underscore.

Proper name Incorrect name
ACTED_IN acted_in
OWNED_BY ownedBy

Property

Lower camel case, beginning with a lower-case character

Proper name Incorrect name
firstName first_name
amountOfStudents AMOUNT_OF_STUDENTS

Comparison to Relational Database

When we would like to migrate data from relational database to Neo4j database we would have to think about particular rows from table as a nodes. Having this in mind table name would be node’s label. Properties in node would be just data from particular row. Name of each primary key column can be taken as a relation type in graph database.

Communication protocol – Bolt

Bolt is non-standardized open source protocol designed for databases. This protocol is statement oriented. In simple words it means that client can send statements consisting single string with set of parameters. Server will respond with result message and optional stream of data. Neo4j uses this protocol and default port is 7687.

Neo4j Bloom

Bloom is application available in the Graph Platform which allows users to interact visually with graph data. In simple words it’s web application that shows graph which we are working on.

Neo4j Bloom

Watch this video to see more about Neo4j Bloom.

License

There are two types of license. Community edition is fully featured database that can be used for open-source projects, projects inside organization or for application that runs on personal device. Enterprise edition comes with better availability and scalability for commercial usage.

Neo4j database supports startups. It means that there is possibility to get it for free after signing up to startup program. For more details see here.

Summary

From my perspective Graph databases is perfect choice when we need to model real relation or any more complicated connections between objects. Searching in graphs is enormously fast, what is huge benefit nowadays. Structure of graphs is much more easier to imagine that any document or tabular data structure.

When talking exactly about Neo4j database I like the way how we are operating on data. Cypher is intuitive language that exactly shows what we are going to do thanks to it’s ASCII-Art look. This language is clear and we can read it like normal sentence. I’m also impressed by Bloom tool that Neo4j provides. It perfectly visualize graphs that we work on and user interface is intuitive and clear.

Neo4j isn’t cheap solution but it in cases where speed of data querying counts it can be the best choice. Graph database is much more flexible and easier to maintain which can also be beneficial or even crucial for some kind of projects.

To sum up, I would recommend Neo4j database for every project that can take benefits from path traversal and graph algorithms as well as flexible relation and data modelling.

About author

Hi,
my name is Michał. I’m software engineer. I like sharing my knowledge and ideas to help other people who struggle with technologies and design of todays IT systems or just want to learn something new.
If you want to be up to date leave your email below or leave invitation on LinkedIn, XING or Twitter.

Add your email to be up to date with content that I publish


Leave a Reply

Your email address will not be published. Required fields are marked *