I suspect that the NVMe drives are client-class located on external systems, each with a conventional filesystem with big files that are exported to VMs that mount them as block devices. A lot of layers and media that aren’t up to a sustained workload.
> On Aug 2, 2025, at 12:13 PM, [email protected] wrote: > > As you rightly point out the 110MB/s it sounds very much like the traffic is > going through the wrong interface or being limited. > > So am I correct in my reading of this that this a virtual Ceph environment > running on Proxmox? > > What do you mean by this statement? " All Ceph drives are exposed and an NFS > mounted NVME drive. “ > > Do I take this to mean that you 4 servers are all mounting the same NVME > device over NFS? Just a bit confused as to the exact hardware setup here. > > What is the performance you can get from a single Ceph OSD? Just do a simple > dd to read not write to an OSD drive. Also, `ceph tell osd.X bench` > > Darren > > >> On 2 Aug 2025, at 15:25, Ron Gage <[email protected]> wrote: >> >> Hello from Detroit MI: >> >> I have been doing some limited benchmarking of a Squid cluster. The >> arrangement of the cluster: >> Server Function >> c01 MGR, MON >> c02 MGR, MON >> o01 OSD >> o02 OSD >> o03 OSD >> o04 OSD >> >> Each OSD has 2 x NVME disks for Ceph, each at 370 Gig >> >> The backing network is as follows: >> ens18 Gigabit, mon-ip (192.168.0.0/23) regular MTU (1500) >> ens19 2.5 Gigabit, Cluster Network (10.0.0.0/24) Jumbo MTU (9000) >> >> Behind all this is a small ProxMox cluster. All Ceph machines are running >> on a single node. All Ceph drives are exposed and an NFS mounted NVME >> drive. All Ceph OSD drives are mounted with no cache and single controller >> per drive. Networking bridges are all set to either MTU 9000 or MTU 1500 as >> appropriate. >> >> iPerf3 is showing 2.46 Gbit/sec between servers c01 and o01 on the ens19 >> network. Firewall is off all the way around. OS is CentOS 10. SELinux of >> disabled. No network enhancements have been performed (increasing send/rcv >> buffer size, queue length, etc). >> >> The concern given all this: rados bench can't exceed 110 MB/s in all tests. >> In fact if I didn't know better I would swear that the traffic is being >> either throttled or is somehow routing through a 1Gbit network. The numbers >> that are returning from rados bench are acting like saturation at Gigabit >> and not exhibiting any evidence of being on a 2.5 Gbit network. Monitoring >> at both Ceph and ProxMox consoles confirm the same. Cluster traffic is >> confirmed to be going out ens19 - tested via tcpdump. >> >> Typical command line used for rados bench: rados bench -p s3block 20 write >> >> What the heck am I doing wrong here? >> >> Ron Gage >> >> _______________________________________________ >> ceph-users mailing list -- [email protected] >> To unsubscribe send an email to [email protected] > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
