Thanks Howard.

The --readahead flag helped!

After setting --readahead=0, the average speed improved from 3MB/s to
8MB/s. And I don't see a lot of reads anymore.

On Fri, Mar 16, 2018 at 10:55 PM, Howard Chu <[email protected]> wrote:

> Chuntao HONG wrote:
>
>> I am testing LMDB performance with the benchmark given in
>> http://www.lmdb.tech/bench/ondisk/. And I noticed that LMDB random
>> writes are really slow when the data goes beyond memory.
>>
>> I am using a machine with 4GB DRAM with Intel PCIE SSD. The key size is
>> 10 bytes and value size is 1KB. The benchmark code is given in
>> http://www.lmdb.tech/bench/ondisk/, and the command line I used is
>> "./db_bench_mdb --benchmarks=fillrandbatch --threads=1
>> --stats_interval=1024 --num=10000000 --value_size=1000 --use_existing_db=0
>> ".
>>
>> For the first 1GB of data written, the average write rate is 140MB/s. The
>> rate then drops significantly to 40MB/s for the first 2GB. At the end of
>> the test, in which 10M values are written, the average rate is just 3MB/s,
>> and the instant rate is 1MB/s. I know LMDB is not optimized for writes, but
>> I didn't expect it to be this slow, given that I have a really high-end
>> Intel SSD.
>>
>
> Any flash SSD will get bogged down by a continuous write workload, since
> it must do wear-leveling and compaction in the background and "the
> background" is getting too busy.
>
> I also notice that the way LMDB access the SSD is really strange. At the
>> beginning of the test, it writes the SSD at around 400MB/s, but performs no
>> read, which is expected. But as we write more and more data, LMDB starts to
>> read the SSD. As time goes on, the read throughput rises while the write
>> throughput drops significantly. At the end of test, LMDB is constantly
>> reading at around 190MB/s, while occationally issuing 100MB writes at
>> around 10-20 second intervals.
>> *
>>
>> *
>> 1. Is it normal for LMDB to have such low write throughput (1MB/s at the
>> end of test) for data stored on SSD?
>>
>> 2. Why is LMDB reading more data than it is writing (about 20MB data read
>> per 1MB written) at the end of the test?
>> **
>> *
>> *
>> To my understanding, although we have more data than the DRAM can hold,
>> the branch nodes of the B-tree should still be in the DRAM. So for every
>> write, the only pages that we need to fetch from SSD is the leaf nodes. And
>> when we write the leaf node, we might also need to write its parents. So
>> there should be more writes than reads. But it turns out LMDB is reading
>> much more than writing. I think it might be the reason why it is so slow at
>> the end. But I really cannot understand why.*
>>
>
> Rerun the benchmark with --readahead=0. The kernel does 16page readahead
> by default, and on a random access workload, 15 of those pages are wasted
> effort. They also cause useful pages to be evicted from RAM. This is where
> the majority of the excess reads come from.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
>

Reply via email to