Should a Graph Database be Included in Your Data Strategy?



What is a graph database?

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized, making them useful for heavily inter-connected data.

Graph databases are a type of NoSQL database, created to address the limitations of relational databases. While the graph model explicitly lays out the dependencies between nodes of data, the relational model and other NoSQL database models link the data by implicit connections. In other words, relationships are a first-class citizen in a graph database and can be labelled, directed, and given properties. This is compared to relational approaches where these relationships are implied and must be reunified at run-time. 1

In a conventional database, queries about relationships can take a long time to process. This is because relationships are implemented with foreign keys and queried by joining tables. As any SQL DBA can tell you, performing joins is expensive, especially when you must sort through large numbers of objects—or, worse, when you must join multiple tables to perform the sorts of indirect (e.g. “friend of a friend”) queries that graph databases excel at.

Graph databases work by storing the relationships along with the data. Because related nodes are physically linked in the database, accessing those relationships is as immediate as accessing the data itself. In other words, instead of calculating the relationship as relational databases must do, graph databases simply read the relationship from storage. Satisfying queries is a simple matter of walking, or “traversing,” the graph.

A graph database not only stores the relationships between objects in a native way, making queries about relationships fast and easy, but allows you to include different kinds of objects and different kinds of relationships in the graph. Like other NoSQL databases, a graph database is schema-less. Thus, in terms of performance and flexibility, graph databases hew closer to document databases or key-value stores than they do relational or table-oriented databases.

Consider a music database, with albums, bands, labels, and performers. If you want to report all the performers that were featured on this album by that band released on these labels—four different tables—you have to explicitly describe those relationships. With a relational database, you accomplish this by way of new data columns (for one-to-one or one-to-many relationships), or new tables (for many-to-many relationships). This is practical as long as you’re managing a modest number of relationships. If you’re dealing with millions or even billions of relationships—friends of friends of friends, for instance—those queries don’t scale well.

In short, if the relationships between data, not the data itself, are your main concern, then a different kind of database—a graph database—is in order.

Graph database use cases

Graph databases work best when the data you’re working with is highly connected and should be represented by how it links or refers to other data, typically by way of many-to-many relationships.

Social network is a useful example. Graph databases reduce the amount of work needed to construct and display the data views found in social networks, such as activity feeds, or determining whether or not you might know a given person due to their proximity to other friends you have in the network.

Another application for graph databases is finding patterns of connection in graph data that would be difficult to tease out via other data representations. Fraud detection systems use graph databases to bring to light relationships between entities that might otherwise have been hard to notice.

Similarly, graph databases are a natural fit for applications that manage the relationships or interdependencies between entities. You will often find graph databases behind recommendation engines, content and asset management systems, identity and access management systems, and regulatory compliance and risk management solutions. 2

Should graph database be a part of your data strategy?

The answer to this question depends on your business objectives, the data relationships you are trying to establish and the specific use cases you may have. We can help you determine if a graph database is the right choice for you and should be considered for data management instead of perhaps a conventional, relational database.

1 Wikipedia, Graph Database

2 InfoWorld, What is a Graph Database? A Better Way to Store Data

1 view