16-FEB-2011: A simple "append-only" database

A proposed API and implementation for an append-only database

  1. Journal (multiple concurrent writers) -- all writes go here, acts as a buffer
  2. Database (flat text file, looks like a log file) -- written to by single thread reading from Journal; order of inserts is not guaranteed to be the same as order of inserts into journal
  3. Index (binary tree of keys) -- values are kept in sorted order with a count of the number of values; values are pointers to offsets in the Database

Support disabling the journal or the index at run-time by not specifying them during an amdb_open() call

Journal format:

  1. [Magic (32-bits)]
  2. [Journal Bypass Flag (8-bits)]
  3. [Header Size (HDR_SZ; 32-bits)]
  4. [Header (HDR_SZ * 8-bits)]:
    1. Tag-Length-Value (32b, 32b, Variable)
    2. Tags:
      1. 0x00: Number of tracks (NUM_TRACKS)
      2. 0x01: Track size (TRACK_SIZE)
      3. 0x02: Track 0 offset from start of file (TRACK0_START)
  5. [Tracks (NUM_TRACKS * TRACK_SIZE * 8-bits)]
    1. Tracks are an array of bytes TRACK_SIZE (which must be a multiple of the page size) bytes long
    2. Each track is a circular buffer for a particular writer
    3. Each entry in the buffer is of variable length and called a message
    4. Each message is in the format:
      1. [Removed (boolean; 8-bits)]
      2. [Complete (boolean; 8-bits)]
      3. [Message Length (MSG_LEN; 16-bits)]
      4. [Message (MSG_LEN * 8-bits)]

Journal bypass can be mmap()'d and checked before all writes. This can be used to resize the journal with the application running. Writes will be blocked/stalled. Initialization of tracks will be setting the message length of the first message to 0.

The number of tracks will determine the number of concurrent writers. Each track will be a range of bytes that are exclusively written to by a single thread.

The tracks are circular buffers (as noted above), with the exception that a message will not wrap around in the middle of the message. If a message would wrap around the track it will instead be started at the beginning of the track.

The purpose of the "Removed" flag on the message is to signal whether this message has been deleted from the journal. The purpose of the "Completed" flag on the message is to signal whether or not this message has been completely written to the journal. When writing a new message to the journal the length will be written first, then the message, then the "Completed" flag will be set to 1. The consumer thread will then read messages from the tracks as they are completed and mark them as removed. On start-up, the journal will be consumed and the track re-initialized and then writes will start at the beginning of the track.

If a track becomes full (i.e., the journal writers produce data more quickly than the journal consumer can consume it for too long) then the writer of that track must block/stall.

API:

  1. void *amdb_open(const char *db, const char *journal, const char *index);
    1. All parameters are strings containing the pathnames to files to store their respective components
  2. void amdb_insert(void *handle, time_t date, ...);
    1. All additional arguments are required to be strings in pairs, the first of the pair being the field name and the second being the field contents
    2. Terminate with NULL
  3. void amdb_consumejournal(void *handle);
    1. Starts a process that consumes entries from the journal and writes to database and index files.