[ceph-users] Re: Performance issues

Anthony D'Atri Sat, 02 Aug 2025 09:21:25 -0700

I suspect that the NVMe drives are client-class located on external systems, 
each with a conventional filesystem with big files that are exported to VMs 
that mount them as block devices.  A lot of layers and media that aren’t up to 
a sustained workload.




> On Aug 2, 2025, at 12:13 PM, [email protected] wrote:
> 
> As you rightly point out the 110MB/s it sounds very much like the traffic is 
> going through the wrong interface or being limited.
> 
> So am I correct in my reading of this that this a virtual Ceph environment 
> running on Proxmox?
> 
> What do you mean by this statement? " All Ceph drives are exposed and an NFS 
> mounted NVME drive. “
> 
> Do I take this to mean that you 4 servers are all mounting the same NVME 
> device over NFS? Just a bit confused as to the exact hardware setup here.
> 
> What is the performance you can get from a single Ceph OSD? Just do a simple 
> dd to read not write to an OSD drive.

Also, `ceph tell osd.X bench`


> 
> Darren
> 
> 
>> On 2 Aug 2025, at 15:25, Ron Gage <[email protected]> wrote:
>> 
>> Hello from Detroit MI:
>> 
>> I have been doing some limited benchmarking of a Squid cluster. The 
>> arrangement of the cluster:
>> Server        Function
>> c01             MGR, MON
>> c02             MGR, MON
>> o01            OSD
>> o02            OSD
>> o03            OSD
>> o04            OSD
>> 
>> Each OSD has 2 x NVME disks for Ceph, each at 370 Gig
>> 
>> The backing network is as follows:
>> ens18        Gigabit, mon-ip (192.168.0.0/23) regular MTU (1500)
>> ens19        2.5 Gigabit, Cluster Network (10.0.0.0/24) Jumbo MTU (9000)
>> 
>> Behind all this is a small ProxMox cluster.  All Ceph machines are running 
>> on a single node.  All Ceph drives are exposed and an NFS mounted NVME 
>> drive.  All Ceph OSD drives are mounted with no cache and single controller 
>> per drive.  Networking bridges are all set to either MTU 9000 or MTU 1500 as 
>> appropriate.
>> 
>> iPerf3 is showing 2.46 Gbit/sec between servers c01 and o01 on the ens19 
>> network.  Firewall is off all the way around.  OS is CentOS 10.  SELinux of 
>> disabled.  No network enhancements have been performed (increasing send/rcv 
>> buffer size, queue length, etc).
>> 
>> The concern given all this: rados bench can't exceed 110 MB/s in all tests.  
>> In fact if I didn't know better I would swear that the traffic is being 
>> either throttled or is somehow routing through a 1Gbit network.  The numbers 
>> that are returning from rados bench are acting like saturation at Gigabit 
>> and not exhibiting any evidence of being on a 2.5 Gbit network.  Monitoring 
>> at both Ceph and ProxMox consoles confirm the same.  Cluster traffic is 
>> confirmed to be going out ens19 - tested via tcpdump.
>> 
>> Typical command line used for rados bench: rados bench -p s3block 20 write
>> 
>> What the heck am I doing wrong here?
>> 
>> Ron Gage
>> 
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Performance issues

Reply via email to