On Wed, 28 Jun 2023 at 22:44, Ilya Dryomov <[email protected]> wrote:
>> ** TL;DR
>>
>> In testing, the write latency performance of a PWL-cache backed RBD
>> disk was 2 orders of magnitude worse than the disk holding the PWL
>> cache.
>>
>> ** Summary
>>
>> I was hoping that PWL cache might be a good solution to the problem of
>> write latency requirements of etcd when running a kubernetes control
>> plane on ceph. Etcd is extremely write latency sensitive and becomes
>> unstable if write latency is too high. The etcd workload can be
>> characterised by very small (~4k) writes with a queue depth of 1.
>> Throughput, even on a busy system, is normally very low. As etcd is
>> distributed and can safely handle the loss of un-flushed data from a
>> single node, a local ssd PWL cache for etcd looked like an ideal
>> solution.
>
>
> Right, this is exactly the use case that the PWL cache is supposed to address.

Good to know!

>> My expectation was that adding a PWL cache on a local SSD to an
>> RBD-backed would improve write latency to something approaching the
>> write latency performance of the local SSD. However, in my testing
>> adding a PWL cache to an rbd-backed VM increased write latency by
>> approximately 4x over not using a PWL cache. This was over 100x more
>> than the write latency performance of the underlying SSD.
>>
>> My expectation was based on the documentation here:
>> https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/
>>
>> “The cache provides two different persistence modes. In
>> persistent-on-write mode, the writes are completed only when they are
>> persisted to the cache device and will be readable after a crash. In
>> persistent-on-flush mode, the writes are completed as soon as it no
>> longer needs the caller’s data buffer to complete the writes, but does
>> not guarantee that writes will be readable after a crash. The data is
>> persisted to the cache device when a flush request is received.”
>>
>> ** Method
>>
>> 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other
>> running libvirt and mounting a VM’s disk with librbd (also 17.2.6)
>> from the first node.
>>
>> All performance testing is from the libvirt system. I tested write
>> latency performance:
>>
>> * Inside the VM without a PWL cache
>> * Of the PWL device directly from the host (direct to filesystem, no VM)
>> * Inside the VM with a PWL cache
>>
>> I am testing with fio. Specifically I am running a containerised test,
>> executed with:
>>    podman run --volume .:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
>>
>> This container runs:
>>    fio --rw=write --ioengine=sync --fdatasync=1
>> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
>> --output-format=json --runtime=60 --time_based=1
>>
>> And extracts sync.lat_ns.percentile["99.000000"]
>
>
> Matthew, do you have the rest of the fio output captured?  It would be 
> interesting to see if it's just the 99th percentile that is bad or the PWL 
> cache is worse in general.

Sure.

With PWL cache: https://paste.openstack.org/show/820504/
Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
With PWL cache, 'rbd_cache'=false:
https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/

>> ** Results
>>
>> All results were stable across multiple runs within a small margin of error.
>>
>> * rbd no cache: 1417216 ns
>> * pwl cache device: 44288 ns
>> * rbd with pwl cache: 5210112 ns
>>
>> Note that by adding a PWL cache we increase write latency by
>> approximately 4x, which is more than 100x than the underlying device.
>>
>> ** Hardware
>>
>> 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM
>> Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC
>> H730P Mini (Embedded)
>>
>> OS installed on rotational disks
>>
>> N.B. Linux incorrectly detects these disks as rotational, which I
>> assume relates to weird behaviour by the PERC controller. I remembered
>> to manually correct this on the ‘client’ machine for the PWL cache,
>> but at OSD configuration time ceph would have detected them as
>> rotational. They are not rotational.
>>
>> ** Ceph Configuration
>>
>> CentOS Stream 9
>>
>>    # ceph version
>>    ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
>> (stable)
>>
>> Single node installation with cephadm. 2 OSDs, one on each SSD.
>> 1 pool with size 2
>>
>> ** Client Configuration
>>
>> Fedora 38
>> Librbd1-17.2.6-3.fc38.x86_64
>>
>> PWL cache is XFS filesystem with 4k block size, matching the
>> underlying device. The filesystem uses the whole block device. There
>> is no other load on the system.
>>
>> ** RBD Configuration
>>
>> # rbd config image list libvirt-pool/pwl-test | grep cache
>> rbd_cache                                    true
>
>
> I wonder if rbd_cache should have been set to false here to disable the 
> default volatile cache.  Other than that, I don't see anything obviously 
> wrong with the configuration at first sight.

I added some full output for this above.

>
> --
> Ilya
>
>>   config
>> rbd_cache_block_writes_upfront               false
>>   config
>> rbd_cache_max_dirty                          25165824
>>   config
>> rbd_cache_max_dirty_age                      1.000000
>>   config
>> rbd_cache_max_dirty_object                   0
>>   config
>> rbd_cache_policy                             writeback
>>   pool
>> rbd_cache_size                               33554432
>>   config
>> rbd_cache_target_dirty                       16777216
>>   config
>> rbd_cache_writethrough_until_flush           true
>>   pool
>> rbd_parent_cache_enabled                     false
>>   config
>> rbd_persistent_cache_mode                    ssd
>>   pool
>> rbd_persistent_cache_path                    /var/lib/libvirt/images/pwl
>>   pool
>> rbd_persistent_cache_size                    1073741824
>>   config
>> rbd_plugins                                  pwl_cache
>>   pool
>>
>> # rbd status libvirt-pool/pwl-test
>> Watchers:
>>          watcher=10.1.240.27:0/1406459716 client.14475
>> cookie=140282423200720
>> Persistent cache state:
>>          host: dell-r640-050
>>          path:
>> /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool
>>          size: 1 GiB
>>          mode: ssd
>>          stats_timestamp: Mon Jun 26 11:29:21 2023
>>          present: true   empty: false    clean: true
>>          allocated: 180 MiB
>>          cached: 135 MiB
>>          dirty: 0 B
>>          free: 844 MiB
>>          hits_full: 1 / 0%
>>          hits_partial: 3 / 0%
>>          misses: 21952
>>          hit_bytes: 6 KiB / 0%
>>          miss_bytes: 349 MiB

-- 
Matthew Booth
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to