Thanks Howard. The --readahead flag helped!
After setting --readahead=0, the average speed improved from 3MB/s to 8MB/s. And I don't see a lot of reads anymore. On Fri, Mar 16, 2018 at 10:55 PM, Howard Chu <[email protected]> wrote: > Chuntao HONG wrote: > >> I am testing LMDB performance with the benchmark given in >> http://www.lmdb.tech/bench/ondisk/. And I noticed that LMDB random >> writes are really slow when the data goes beyond memory. >> >> I am using a machine with 4GB DRAM with Intel PCIE SSD. The key size is >> 10 bytes and value size is 1KB. The benchmark code is given in >> http://www.lmdb.tech/bench/ondisk/, and the command line I used is >> "./db_bench_mdb --benchmarks=fillrandbatch --threads=1 >> --stats_interval=1024 --num=10000000 --value_size=1000 --use_existing_db=0 >> ". >> >> For the first 1GB of data written, the average write rate is 140MB/s. The >> rate then drops significantly to 40MB/s for the first 2GB. At the end of >> the test, in which 10M values are written, the average rate is just 3MB/s, >> and the instant rate is 1MB/s. I know LMDB is not optimized for writes, but >> I didn't expect it to be this slow, given that I have a really high-end >> Intel SSD. >> > > Any flash SSD will get bogged down by a continuous write workload, since > it must do wear-leveling and compaction in the background and "the > background" is getting too busy. > > I also notice that the way LMDB access the SSD is really strange. At the >> beginning of the test, it writes the SSD at around 400MB/s, but performs no >> read, which is expected. But as we write more and more data, LMDB starts to >> read the SSD. As time goes on, the read throughput rises while the write >> throughput drops significantly. At the end of test, LMDB is constantly >> reading at around 190MB/s, while occationally issuing 100MB writes at >> around 10-20 second intervals. >> * >> >> * >> 1. Is it normal for LMDB to have such low write throughput (1MB/s at the >> end of test) for data stored on SSD? >> >> 2. Why is LMDB reading more data than it is writing (about 20MB data read >> per 1MB written) at the end of the test? >> ** >> * >> * >> To my understanding, although we have more data than the DRAM can hold, >> the branch nodes of the B-tree should still be in the DRAM. So for every >> write, the only pages that we need to fetch from SSD is the leaf nodes. And >> when we write the leaf node, we might also need to write its parents. So >> there should be more writes than reads. But it turns out LMDB is reading >> much more than writing. I think it might be the reason why it is so slow at >> the end. But I really cannot understand why.* >> > > Rerun the benchmark with --readahead=0. The kernel does 16page readahead > by default, and on a random access workload, 15 of those pages are wasted > effort. They also cause useful pages to be evicted from RAM. This is where > the majority of the excess reads come from. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ >
