G1: A simple graph store, written in Rust2019-12-15

Motivation

Over the next few weeks/months, I'm planning to implement a graph store in Rust, G1. I mainly want to do this because mentat became unmaintained, and I like the idea of Datalog as a database query language. Implementing a database also seems like a reasonable enough way to learn about one.

I also want a "database of everything" at some point -- a single database I put all my information into, and can query against in a single, unified way. (This also ties into the Stahl project.)

Data Model

Atoms: Atoms are the nodes of the graph. Each is represented as a UUID.

Names: Names uniquely identify an Atom. They have a namespace and a title, both of which are Strings.

Edges: Edges are directed, with an Atom at both endpoints. Edges have a String label associated with them. At most one edge between two Atoms with a given label may exist.

Attributes: Attributes are attached to Atoms. They have a key and a value, both of which are Strings.

Blobs: Blobs are attached to Atoms. They have a kind, which is a String; a type, which is a MIME type; and contents, which are an arbitrarily large binary string. Blobs are referred to by a SHA256 hash.

Strings are UTF-8 strings, which should be no longer than 256 bytes.

Query Language

The query language is a variant of Datalog. Datalog can be very efficient to evaluate, and complex queries are (in my opinion) much easier to read than similar SQL queries. It also expresses queries on graphs very naturally.

An example of a query:

friend(Me, You) :-
	edge(Me, You, "friend").

friendOfFriend(Me, You) :-
	friend(Me, Other),
	friend(Other, You).

sameAtom(X, X) :- atom(X).

frenemyName(Me, YourName) :-
	friendOfFriend(Me, You),
	! friend(Me, You),
	! sameAtom(Me, You),
	attr(You, "name", YourName).

?- frenemyName("59760f34-eee0-44e2-9358-f48d46c686ee", YourName).

This query finds the names of the friends of the friends of the user represented by the atom with a UUID of 59760f34-eee0-44e2-9358-f48d46c686ee, excluding that user and their direct friends. I'll describe the language more fully in a later post.

Roadmap

Currently, I'm tracking work on the implementation in GitHub Issues, but note that at the time of writing, these issues are in the context of a laughably unoptimized implementation on top of SQLite.

In terms of the blog posts, I'm thinking of roughly the order:

G1's Query Language
How G1 Stores Data On Disk
The Magic Sets Transformation
How G1 Stores Data In Memory
Implementing Transactions in G1
Ensuring Crash-Safety in G1