This post is by Hashrocket from Hashrocket
Click here to view on the original site: Original Post
Aside from a brief excursion into MongoDB years ago, I have had no experience with NoSQL databases. This meant that when I was presented with a project that required me to convert a PostgreSQL-backed application to Neo4j, I knew I would have a lot of learning to do.
I don't profess to be a Neo4j master (YET!), but I found a way to think about the database that worked for me and helped me move forward on my project.
What Does My Data Look Like?
My journey started with trying to figure out how data is meant to be stored in a Neo4j database. In Postgres, I'm used to tables with rows and columns.
So how does that translate to Neo4j?
Data in a Neo4j database is stored as a node. You can think of a node like a row, with properties on it to store information, similar
how a row has columns.
While columns in Postgres are directly defined in the schema, Neo4j is considered schema optional. Rather than defining all of the available properties and their types for a node, Neo4j focuses on maintaining a list of indexes and constraints.
Properties required some of the biggest changes in the way I thought about stored data. Because Neo4j doesn't have a schema defining properties, there is no such thing as a null value for a property on a node — the property either exists on the node, or doesn't.
Now, all that stuff is fine and dandy, but rows in Postgres don't just exist all on their lonesome; they belong to a table. Do these node things get organized is a similar way?
As it turns out, Neo4j has no concept of tables, but it does have something called labels. Labels can be used as names or identifiers for the nodes in the graph.
A major difference with labels, though, is that a node can belong to multiple labels, or can also exist entirely free of labels. The main goal of the labels is to organize the nodes into sets of data, making querying more straightforward.
This is an interesting approach, because it means that labels can be added and removed from a node, and can be used to represent the state of a node.
Unlike the snake case, plural table names of Postgres, Neo4j labels follow the convention of being camel case and singular, meaning that something like a
blog_posts table becomes a
Cool, How Do I Relate These Things?
In Postgres, connections are created between tables through foreign keys and join tables. Neo4j doesn't store foreign keys in nodes; rather, it uses a relationship.
A relationship is used to link two nodes together in a Neo4j database, and requires a type to be set to define the meaning of the relationship. Relationship types behave similar to labels, working to make grouping and querying items in the graph less complicated. However, they follow a different naming convention, using all caps snake case instead of camel case (ie
It is easy to think of a relationship as a replacement for a Postgres join table, since a relationship not only ties two nodes together, it can also have properties of its own.
The caveat to this thought process is that relationships are also used to replace the foreign key based associations in Postgres. Meaning that there are always three objects involved to build a connection in Neo4j: two nodes, and a relationship.
It's important to know that relationships are directional, but the direction you set them up in doesn't really matter. In the example above, I could have just as easily set up my blog posts to have an
AUTHORED_BY relationship pointing to an author without sacrificing any performance.
Armed with a basic understanding of how data is meant to be stored and associated in a Neo4j database, it was time to start writing some Cypher queries.
Neo4j has a pretty great SQL user's guide to querying with Cypher here. I would highly recommend reading through it to start getting an idea of high SQL queries translate to Cypher.