G1: A simple graph store, written in Rust
Over the next few weeks/months, I'm planning to implement a graph store in Rust, G1. I mainly want to do this because mentat became unmaintained, and I like the idea of Datalog as a database query language. Implementing a database also seems like a reasonable enough way to learn about one.
I also want a "database of everything" at some point -- a single database I put all my information into, and can query against in a single, unified way. (This also ties into the Stahl project -- to be written about at a later date.)
Atoms: Atoms are the nodes of the graph. Each is represented as a UUID.
Names: Names uniquely identify an Atom. They have a namespace and a title, both of which are Strings.
Edges: Edges are directed, with an Atom at both endpoints. Edges have a String label associated with them. At most one edge between two Atoms with a given label may exist.
Attributes: Attributes are attached to Atoms. They have a key and a value, both of which are Strings.
Blobs: Blobs are attached to Atoms. They have a kind, which is a String; a type, which is a MIME type; and contents, which are an arbitrarily large binary string. Blobs are referred to by a SHA256 hash.
Strings are UTF-8 strings, which should be no longer than 256 bytes.
The query language is a variant of Datalog. Datalog can be very efficient to evaluate, and complex queries are (in my opinion) much easier to read than similar SQL queries. It also expresses queries on graphs very naturally.
An example of a query:
friend(Me, You) :- edge(Me, You, "friend"). friendOfFriend(Me, You) :- friend(Me, Other), friend(Other, You). sameAtom(X, X) :- atom(X). frenemyName(Me, YourName) :- friendOfFriend(Me, You), ! friend(Me, You), ! sameAtom(Me, You), attr(You, "name", YourName). ?- frenemy("59760f34-eee0-44e2-9358-f48d46c686ee", YourName).
This query finds the names of the friends of the friends of the user represented by the atom with a UUID of
59760f34-eee0-44e2-9358-f48d46c686ee, excluding that user and their direct friends. I'll describe the language more fully in a later post.
Currently, I'm tracking work on the implementation in GitHub Issues, but note that at the time of writing, these issues are in the context of a laughably unoptimized implementation on top of SQLite.
In terms of the blog posts, I'm thinking of roughly the order:
- G1's Query Language
- The Magic Sets Transformation
- How G1 Stores Data (in-memory and on disk)
- Implementing Transactions in G1
- Ensuring Crash-Safety in G1