Alec Matusis wrote:
> We have an environment with no flags that contains a database with no flags. 
> The database is append only, no deletions or modifications. It is written 
> using a
> single RW transaction, in the absence of any RO transactions. We observe that 
> when we commit and recreate the RW transaction every 2000 insertion ops, the
> data.mdb file size on disk is 2x larger than when committing every 64000  
> insertion ops. The mdb_copy –c utility shrinks the large 2k ops commit file 
> to almost
> the same file size as the 64k commit one. mdb_stat –e on the data.mdb shows 
> that  when we have more commits and bigger file, we have more pages used by 
> the same
> proportion.
> 
> In production we will have several large DBs (>1TB) on an NVMe card and we do 
> not have the 2x space for periodic mdb_copy –c compactifications (and we 
> cannot
> stop the writing process). We also need to commit every 2000 write ops, 
> because there will be short-lived RO transactions that need to see the DB 
> updates every
> 2000 writes.
> 
>  
> 
> 1.  Why is the file size on disk dependent on the commit frequency? (I 
> suppose because with less frequent commits it can allocate data between pages 
> more
> efficiently)?

LMDB does copy-on-write. Every time you start a new transaction, any page you 
modify must be copied first.
If you do many operations in the same transaction, the modified pages can be 
reused as-is, instead of needing
to be copied again.

> 2.  What can we do to reduce data.mdb, if we must commit frequently? Can we 
> use any environment, transaction or db flags, or anything else?

If it is truly, strictly append-only use, which means every newly inserted key 
is greater than all
existing keys, then you should use the MDB_APPEND flag. That will cut growth by 
half.

> We are on Linux 5.4.0 / ext4 fs. The DB that grows 2x faster with more 
> frequent commits has bytearr key -> u32 val structure (the byterarray key is 
> between 31
> and 36 bytes). Another DB that has a reverse u32 key -> bytearr structure 
> oonly grows 10% larger in the more frequent commits regime.
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Reply via email to