Lessons from MapDB development
MapDB is great project, but for many reasons it is falling behind other projects which raised around the same time (Hazelcast, Redis…). In this post I will outline mistakes I made over the years, while working on MapDB.
- concurrency
- JDMB3 back in 2012 used single
ReadWriteLock
to handle concurrency.- That allows parallel readers, but single writer.
- SQLite has similar approach
- Antirez from Redis is a big advocate of simplifying things by avoiding concurrency
- That was redesigned in MapDB 1, to allow parallel writers
- Parallel writers did not brought real performance benefits
j.u.c.ConcurrentSkipListMap
scales linearly with number of cores- MapDB only scales up to 4 cores
- Concurrency greatly complicated things and is responsible for many delays
- Most benefits for concurrency could be achieved under single lock
- Fail fast iterator in JDBM3 would throw
ConcurrentModificationException
, that can be fixed under single lock - Concurrent scalability is still possible under single lock with sharding and other trivial tricks
- Single lock would still allow background writer thread, main benefit for a latency
- Fail fast iterator in JDBM3 would throw
- JDMB3 back in 2012 used single
- writing database is hard task
- working over raw binary files is thought
- good luck debuging wrong file offset at 1TB stores
- even now in 2017, there are not many database engines
- most of them use relatively simple ideas (BTree, LSM)
- papers describe logarithms are often impresise
- there is B-Link-Tree paper which describes concurrent BTree
- published in 1980ties, many citations
- but even today it is not clear howto handle some concurrent cases (root update)
- initial implementation took one week
- it took about 3 months of work to nail it and make it thread safe
- there is B-Link-Tree paper which describes concurrent BTree
- working over raw binary files is thought
- too many features
- MapDB had too many ways to open files, handle concurrency,
- that created too many combinations to test
- it was hard to document and explain all the features
- code duplication and not invented here
- I spend long time written code, which was already written in other libraries
- MapBD was self contained, with no dependencies
- MapDB does not integrate with default tools and defacto standards
- TFile and HFile and other formats
- data exported from MapDB could be used by other databases
- mapdb could operate directly over data created by other tools
- other file formats
- reimplementation of existing API (LevelDB java binding)
- this way MapDB could be used as a drop-in replacement for other libraries
- TFile and HFile and other formats
- did not follow test driven development
- automated testing in MapDB is still fairly good, for example we test process crash recovery with
killl -9 PID
- we had the same problem with JDBM 1.5 back in 2008
- long running tests were broken for a long time (fixed now)
- it took too long to run default unit tests (fixed now)
- automated testing in MapDB is still fairly good, for example we test process crash recovery with
- not enough performance testing
- no performance regression testing
- Single Entry locks destroyed performance in 3.x branch (fixed in 3.0.5)
- old code had concurrent scalability problems
- it used segmented locks
- too many embedded locks, sometimes semaphores without memory barrier would do
- file format and API had changed way too many times
- it was necessary to fix early design mistakes
- should have started with in-memory store
- code change is sometimes necessary
- documentation
- spend way too much time deciding on format (Markdown versus Restructured)
- too much time went into generating PDFs, not many people are using it
- mapdb.org was originally generated by Maven Site plugin (haha)
- github wiki would be fine from start
- MapDB needs way more code examples
Comments
xamde ⬣ • 4 years ago Thanks for sharing. I watched the Java embedded database space thoroughly, there are few real candidates, MapDB always looking like the best of them. My main problem was stability. I needed far less features, cared less about ultra-performance, but need a stable, reliable, somewhat scalable (otherwise I could use in-memory) key-value store first. I am happy to see MapDB lives on and I hope there will be stable, maintained, releases with stable on-disk formats.
–
Nick Apperley • 3 years ago
Since Kotlin is being used with MapDB have Kotlin Coroutines ( https://www.youtube.com/wat… ) been considered for non blocking concurrency? Some NoSQL DB systems use Coroutines ( https://en.wikipedia.org/wi… ) to handle concurrency ( https://medium.com/software… ).