The project I’ve been working on in the last several months is a general-purpose DBMS engine written in Rust from scratch.
General-purpose DBMS provides transaction support with ACID properties (atomicity, consistency, isolation, durability), support for a certain query language to perform data manipulation, provides reliability and fault tolerance. It consists of such parts as query parser, query optimizer, execution engine, storage engine, transaction journal, different kinds of caches (for data, for parsed queries, etc.), and things to support transaction isolation, like locking and MVCC (https://en.wikipedia.org/wiki/Multiversion_concurrency_control).
Even if your target is the simplest DBMS which supports above-mentioned functions, it is a very complex thing to implement. So I decided to split it into two parts: backend and frontend. Frontend includes query parser, optimizer, execution engine, client interaction interface. It manages data catalog and schema (relational tables, for example), user requests and sessions. Backend provides a completely schema-less platform for data manipulation with ACID properties. It manages storage, concurrent transactions, transaction journal, MVCC, locks, buffering for data.
I’ve started working on frontend initially some time ago. It was written in C, and it is not completed yet. DBMS engine described here is backend part of DBMS with proclaimed properties. It is completely schema-less, which means any kind of DBMS can be built on top of it, SQL or NoSQL, doesn’t matter. That’s why I find this splitting useful. Having well-defined interface, different backends and frontends can be combined. E.g. frontend can be SQL database, but backend initially residing on a single machine can be substituted later with distributed implementation. Or the same backend can be used to build relational DBMS, and document-oriented DBMS on top of it.
The first goal was to create a working prototype. The most simple but fully functional DBMS engine. Work is still in progress, but most part of it is done. You can find repository with code here: https://github.com/stencillogic/db-core
Below is the architecture of the prototype:
Instance is a top-level component and interface for interacting with DBMS engine. Client code creates Instance and uses its functions to start, commit or rollback transactions, manipulate data, perform administrative actions.
The first thing that should be done is database initialization if it is not yet created. For that, Instance provides function initialize_datastore. This function creates required database files in the filesystem.
As soon as database is created, Instance can be used to start transaction, open an object for reading or writing, write or read data, and in the end, commit or rollback the changes. Object is just a sequence of bytes of arbitrary length identified by an ObjectId. Object in a relational database can correspond to a row, in a document-oriented database to a document, etc.
To make things concurrent, Instance can be shared between threads using functions get_shared_state and from_shared_state. Instance can’t be moved between threads, thus the shared state is something that can be moved and used in another thread to recreate instance.
In the end, Instance can be shut down.
Now let’s see at the other components.
Transaction manager keeps track of current transactions in the system. Besides serving as a registry of transactions, it also lets transactions to wait for a certain transaction to finish. For example, when two transactions try to modify the same object, only one of them can do it and the other one must wait until the first one is finished.
Log manager provides interface for transaction journal. Transaction journal keeps track of all changes in the system: modified data, commits and rollbacks, and also checkpoints. Log writer and log reader are underlying components that prepare log data to be written to a log. Then the data is pushed to a memory buffer and from there to the file system. Logs are represented as files of a limited size. Whenever log reaches a certain size, a new file is being created.
Storage driver is abstract interface for data storage. In Db-core block storage driver is implemented by default. Block storage driver uses blocks of fixed size to store any kind of data. Blocks can be of type: data block, versioning block, checkpoint block.
Data blocks represent main storage. Each of the data blocks consists of entries of variable size that contain data itself. So the objects are represented by a sequence of entries with certain starting entry and ending entry in the block storage driver.
Versioning blocks are blocks used to store previous versions of entries. Whenever a transaction wants to modify an entry, the entry is first copied to the versioning store. Then, this entry can be used to restore entry in the main storage when transaction rolls back. Also, versioned entries are used to provide transaction isolation: when some transaction modifies entry and it is not yet committed or rolled back, the other transaction can read the entry without waiting for the first transaction to finish using entry from the versioning store. This corresponds to Read Committed isolation level.
Checkpoint blocks are blocks representing checkpoint store. There is Checkpointer component on the diagram that triggers checkpoints periodically. Checkpoint is a snapshot of the main store as of certain moment in time. Information about checkpoint events is also written to transaction log. The snapshot is used to restore state when the system starts:
- it finds the latest completed checkpoint in the transaction log,
- restores data blocks from the checkpoint store,
- and then replays changes from the transaction log starting from the checkpoint.
Data is stored in the file system. Between file system and storage driver lays memory buffer to cache blocks. Before any block can be accessed, it first must be read and put into cache. Cache by default uses LRU eviction. Whenever block is being accessed for reading or writing, it is first locked in shared or exclusive mode correspondingly. Blocks in cache when accessed, are also pinned to prevent eviction.
The layout of data in the file system.
DataStore provides interface for data stored in the file system. I’ll cover storage layout in the next article.