JetBrains Xodus - code overview

I started exploring alternative DB engines. I want to find interesting algorithms, new ideas and perhaps some new code for MapDB.

Xodus

First engine I review is JetBrains Xodus. It was developed for JetBrains YouTrack and latter open-sourced under Apache2 license. It powers YouTrack in production, I could not find other users. It started in 2010, and is production ready.

Main characteristics

  • Log based storage
  • BTree and Patricia Prefix Tree indexes
  • Hybrid between memory/disk store,
    • needs lot of heap for caches and internal datastructures
  • Strong lock-free snapshot isolation
  • Entity store (object with properties) with some query capabilities
  • Stable, but not much documentation
  • Written partly in Kotlin (as MapDB :-))

This blog post is just code walkthrought with my initial impressions. Future blog posts will address architecture, log, indexes, performance and transactions.

There are many code references. You can use them by pressing ctrl+n in Idea.

Benchmarks

  • Simple benchmarks for key-value stores
  • Benchmarked engines: Xodus, Chronicle, H2 MVStore, LMDB, MapDB, Persistit, Tokyo Cabinet
  • Uses JMH, that is mostly usable for microbenchmarks, not really suitable for DB benchmarking
  • No long running stress tests

Utils

Save ByteBuffer cleaner

JVM releases memory-mapped ByteBuffers after GC, that causes disk handle leaks, and could cause JVM crash TODO link

There is ‘cleaner hack’, but on MapDB it only works on OpenJDK 7 and 8, does not work on OpenJDK 9 or Android. This cleaner has branches for Android and OpenJDK 9 and probably works there.

jetbrains.exodus.util.SafeByteBufferCleaner

Packed long

Xodus uses packed longs similar way as MapDB does:

jetbrains.exodus.log.CompressedUnsignedLongByteIterable#getLong(jetbrains.exodus.ByteIterator)

Packaged Long Iterator, get and skip:

jetbrains/exodus/log/CompressedUnsignedLongByteIterable.java:150

Detect JVM version

Code to detect JVM version

jetbrains.exodus.system.JVMConstants

Privileged code execution

Executes piece of code with extra priviliges. MapDB does not handle priviliged code execution, it will be interesting to see how db works on restricted JVMs.

jetbrains.exodus.util.UnsafeHolder#doPrivileged$production_sources_for_module_utils_main

Hexa utils

Convert to/from Hexa string

jetbrains.exodus.util.HexUtil

VCDiff

VCDIFF-like delta encoding algorithm; wiki.

jetbrains.exodus.compress.VcDiff

VLQ

Variable-length quantity universal code; wiki.

jetbrains.exodus.compress.VLQUtil

Spin Allocators

Xodus caches/polls many short lived objects (such as byte[]). Keyword for code search is Spin Allocator. I am not sure how effective object polling is on modern JVMs. General consensus is caching/polling has higher overhead than GC.
Xodus started in 2010, Git history on those goes at least to 2013.

Log

Log is very interesting part (for MapDB): jetbrains.exodus.log.Log

Storage description:

  • Xodus stores data in rolling log files. Log entries (pages) are identified by one-byte headers.
    BTree headers are here: jetbrains.exodus.tree.btree.BTree#loadRootPage.

  • All files are fixed size. If inserted record does not fit into file size, file is padded with empty entries and new file is started.

  • Only small records (pages) are stored directly in log. Large records (pages) are placed into separate file in dedicated directory (blob stororage)

  • Each record (page) is identified by 8byte address, that identifies log file and its offset.

Some code:

  • Consistency check: jetbrains.exodus.log.Log#checkLogConsistency

  • Set log end, discard higher files. Used for rollback and tx flush?: jetbrains.exodus.log.Log#setHighAddress

  • Append record (page), return address: jetbrains.exodus.log.Log#write(byte, int, jetbrains.exodus.ByteIterable)

  • Example of log replay: jetbrains.exodus.log.Log#getFirstLoggableOfType

  • Flush and close: jetbrains.exodus.log.Log#flush(boolean)

  • File remove: jetbrains.exodus.log.Log#removeFile(long, jetbrains.exodus.io.RemoveBlockType)

  • Log garbage collection job: jetbrains.exodus.gc.CleanWholeLogJob#execute

Indexes

  • BTree and Patricia Prefix Tree are two indexes in Xodus.
  • It stores only `byte[], there is no way to use custom comparator.
  • There is prefix and delta compression, probably to reduce space usage for Entities

Code:

  • Binary search in BTree: jetbrains.exodus.tree.btree.BasePageImmutable#binarySearch(jetbrains.exodus.ByteIterable, int)

  • Save BTree Page: jetbrains.exodus.tree.btree.BasePageMutable#save

  • BTree Page put: jetbrains.exodus.tree.btree.BottomPageMutable#put and jetbrains.exodus.tree.btree.InternalPageMutable#put

  • BTree Page delete: jetbrains.exodus.tree.btree.BottomPageMutable#delete and jetbrains.exodus.tree.btree.InternalPageMutable#delete