Alec Matusis wrote: > We have an environment with no flags that contains a database with no flags. > The database is append only, no deletions or modifications. It is written > using a > single RW transaction, in the absence of any RO transactions. We observe that > when we commit and recreate the RW transaction every 2000 insertion ops, the > data.mdb file size on disk is 2x larger than when committing every 64000 > insertion ops. The mdb_copy c utility shrinks the large 2k ops commit file > to almost > the same file size as the 64k commit one. mdb_stat e on the data.mdb shows > that when we have more commits and bigger file, we have more pages used by > the same > proportion. > > In production we will have several large DBs (>1TB) on an NVMe card and we do > not have the 2x space for periodic mdb_copy c compactifications (and we > cannot > stop the writing process). We also need to commit every 2000 write ops, > because there will be short-lived RO transactions that need to see the DB > updates every > 2000 writes. > > > > 1. Why is the file size on disk dependent on the commit frequency? (I > suppose because with less frequent commits it can allocate data between pages > more > efficiently)?
LMDB does copy-on-write. Every time you start a new transaction, any page you modify must be copied first. If you do many operations in the same transaction, the modified pages can be reused as-is, instead of needing to be copied again. > 2. What can we do to reduce data.mdb, if we must commit frequently? Can we > use any environment, transaction or db flags, or anything else? If it is truly, strictly append-only use, which means every newly inserted key is greater than all existing keys, then you should use the MDB_APPEND flag. That will cut growth by half. > We are on Linux 5.4.0 / ext4 fs. The DB that grows 2x faster with more > frequent commits has bytearr key -> u32 val structure (the byterarray key is > between 31 > and 36 bytes). Another DB that has a reverse u32 key -> bytearr structure > oonly grows 10% larger in the more frequent commits regime. > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
