MapDB 4 and near future
I decided to start new major MapDB version. Master branch was already refactored and tagged as MapDB 4.
MapDB 3
- MapDB3 was announced more than 18 months ago.
- Current stable branch is 3.0.
- Dev branch 3.1 is cancelled
- I started backporting changes to 3.0.x releases;
- for example 3.0.5 had major performance improvement that reduced lock overhead.
The 3.0 branch will be maintained until 4.0 is released and becomes stable enough (most likely December 2017)
- I decided to start new version because
- Some features require format change (external files for large records, extended records)
- API changes; lot of refactoring
- changes in core classes (DBMaker, Serializers)
- Some parts rewritten (Write Ahead Log, Volumes)
Major news in 4.0
- Format change in StoreDirect format
- better support for huge records
- transparently put large records into external files
- large records will bypass write-ahead-log, while preserving durability
- better support for checksums and encryption
- lazily streaming of large records (right now it is loaded into
byte[]
)
- full support for zero copy
- deserialization input stream reads directly from mmaped file
- write-ahead-log
- redesign Volumes (file IO)
- refactor File IO to use memory-mapped files better way
- support for
AsynchronousFileChannel
and non-blocking disk IO, with continuations and light thread
- format change
- support for values in external files
- unified header
- format evolution
- old features will be deprecated, but not removed
- way more automated tests
- backward compatibility, format spec will be part of tests
- MapDB will integrate with several libraries
- it will be able to export/import data to Hadoop file formats, Spark…
- I do not like several tiny maven project, so everything will be in MapDB artifacts (or perhaps mapdb-extra)
- in separate package, latter might move into separate jar files
- MapDB artifact will depend on several libraries,
- but those will be optional compile time deps
- user will be responsible for providing those
integration with libs and extras
- mapdb will unify various types of collections
- spark like
- chronicle like
- primitive collections over flat arrays (or memory mapped files)
- flat cols over mmap files
- support for Streams and Parallel Streams
Changes in development
- I kept too tight grip on MapDB, tried to make it perfect, that made development too slow
- Lessons from mapdb development blogpost
- in future I will move faster, but keep quality where it matters; automated unit and acceptance tests
- way more blog posts
- comments on various projects, algorithms, papers
- staging place for documentation,
- new feature will be first documented in blog post for comments, then moved into separate chapter
- youtube channel
- screencast videos to walkthrough code in IDE (very fast to produce, good for quick introduction)
- change in a way documentation is made
- bullet point oriented format
- very fast to make, very readable
- Antirez from Redis originally used this format
- contributors are welcomed to reedit and polish the documentation
- more code oriented
- code examples will be written first, before code
- bullet point oriented format
- change in release cycle
- MapDB4 is the last major release
- various formats will be introduced, and deprecated, but never removed
- new formats (or collections) will start new file header, and use different implementations
- new minor (4.X) version will be out every month
- integration tests take about week to finish
- dedicated machine will run integration tests nonstop
- so every week there will be stable snapshot release or minor (4.0.X) bugfix release
- changes in unit tests
- way more unit tests
- test full matrix of all configuration options; CPU is cheap
- concurrency stress tests
- performance regression testing (MapDB 3 release was disaster)
- test storage format compatibility (can read and modify files generated by older 4.0.0 release)
Roadmap for next 3 months
first priority is to finish Elsa Serialization library, but final version will be released together with MapDB 4
- MapDB 4.0 should be out at end of October
- with features of MapDB 3, but without open TODOs (missing compaction)
there will be many blog post describing my progress on MapDB
- semi-stable release (passes acceptance tests) should be out every week
New features after 4.0 release
I have very long list of ideas. So I will go through my bookmarks and notes; and put everything into series of blog posts.
So far most requested features are:
- extra collections to support cryptography and blockchain applications
- authenticated merkle tree (immutable, fast creation with data pump)
- authenticated skip list, already written for iodb
- LSM Store based on IODB
- supports snapshots
- supports branching (the same way Git or other CVS)
- data pump for everything
- including hashmap
- fast creation is important for Merge algos
- spark compatibility
- spark data frames is functional data transformation language
- it also defines how data should be partioned to fit into memory on single node
- spark uses several nodes
- but single node spark swaps data in-out of memory
- mapdb can do it way more efficiently (10x?)
- so I want to have some compatibility with Spark Data Frames
- spark data frames is functional data transformation language
- Query planner support
- support some sort SQLish language with query planner and executor
- take inspiration from Postgres extension API
- SQL engine from SQLite VM?
- use Spark Catalyst??
- reactive support
- planned for very long time I played with Kilim in 2008, JDBM3 was originally steered this direction
- based on Kotlin continuations and perhaps similar framework
- based on
AsynchronousFileChannel
- non-blocking disk IO
- should include MapDB and most of its collections
- support for Akka, RxJava and similar frameworks
time series database
- graph database…
Comments
Eduard Dudar • 3 years ago
No pressure of course but wondering what are the current plans for 4.0 release. Some features like non-blocking IO are very sweet but github shows only 1 issue in closed for 4.0 and about 60 opened.