-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Nick,
I've been trying to replicate your results without success. Can you
help me understand what I'm doing that is not the same as your test?
My setup is two boxes, one is a client and the other is a server. The
server has Intel(R) Atom(TM) CPU C2750 @ 2.40GHz, 32 GB RAM and 2
Intel S3500 240 GB SSD drives. The boxes have Infiniband FDR cards
connected to a QDR switch using IPoIB. I set up OSDs on the 2 SSDs and
set pool size=1. I mapped a 200GB RBD using the kernel module ran fio
on the RBD. I adjusted the number of cores, clock speed and C-states
of the server and here are my results:
Adjusted core number and set the processor to a set frequency using
the userspace governor.
8 jobs 8 depth Cores
1 2 3 4 5 6 7 8
Frequency 2.4 387 762 1121 1432 1657 1900 2092 2260
GHz 2 386 758 1126 1428 1657 1890 2090 2232
1.6 382 756 1127 1428 1656 1894 2083 2201
1.2 385 756 1125 1431 1656 1885 2093 2244
I then adjusted the processor to not go in a deeper sleep state than
C1 and also tested setting the highest CPU frequency with the ondemand
governor.
1 job 1 depth
Cores 1
<=C1, feq range C0-C6, freq range C0-C6, static
freq <=C1, static freq
Frequency 2.4 381 381 379 381
GHz 2 382 380 381 381
1.6 380 381 379 382
1.2 383 378 379 383
Cores 8
<=C1, feq range C0-C6, freq range C0-C6, static
freq <=C1, static freq
Frequency 2.4 629 580 584 629
GHz 2 630 579 584 634
1.6 630 579 584 634
1.2 632 581 582 634
Here I'm see a correlation between # cores and C-states, but not frequency.
Frequency was controlled with:
cpupower frequency-set -d 1.2GHz -u 1.2GHz -g userspace
and
cpupower frequency-set -d 1.2GHz -u 2.0GHz -g ondemand
Core count adjusted by:
for i in {1..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done
C-states controlled by:
# python
Python 2.7.5 (default, Jun 24 2015, 00:41:19)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> fd = open('/dev/cpu_dma_latency','wb')
>>> fd.write('1')
>>> fd.flush()
>>> fd.close() # Don't run this until the tests are completed (the handle has
>>> to stay open).
>>>
I'd like to replicate your results. I'd also like if you can verify
some of mine in your set-up around C-States and cores.
Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.2
Comment: https://www.mailvelope.com
wsFcBAEBCAAQBQJV5g8GCRDmVDuy+mK58QAAe6YP/j+SNGFI2z7ndnbOk87D
UjxG+hiZT5bkdt2/wVfI6QiH0UGDA3rLBsttOHPgfxP6/CEy801q8/fO0QOk
tLxIgX01K4ECls2uhiFAM3bhKalFsKDM6rHYFx96tIGWonQeou36ouDG8pfz
YsprvQ2XZEX1+G4dfZZ4lc3A3mfIY6Wsn7DC0tup9eRp3cl9hQLXEu4Zg8CZ
7867FNaud4S4f6hYV0KUC0fv+hZvyruMCt/jgl8gVr8bAdNgiW5u862gsk5b
sO9mb7H679G8t47m3xd89jTh9siMshbcakF9PXKzrN7DxBb/sBuN3GykesZA
+5jdUTzPCxFu+LocJ91by8FybatpLwxycmfP2gRxd/owclXk5BqqJUnrdYVm
n2GcHobdHVv9k/s+iBVV0xbwqOY+IO9UNUfLAKNy7E1xtpXdTpQBuokmu/4D
WXg3C4u+DsZNvcziO4s/edQ1koOQm1Fcj5VnbouSqmsHpB5nHeJbGmiKNTBA
9pE/hTph56YRqOE3bq3X/ohjtziL7/e/MVF3VUisDJieaLxV9weLxKIf0W9t
L7NMhX7iUIMps5ulA9qzd8qJK6yBa65BVXtk5M0A5oTA/VvxHQT6e5nSZS+Z
WLjavMnmSSJT1BQZ5GkVbVqo4UVjndcXEvkBm3+McaGKliO2xvxP+U3nCKpZ
js+h
=4WAa
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Sat, Jun 13, 2015 at 8:58 AM, Nick Fisk <[email protected]> wrote:
> Hi All,
>
> I know there has been lots of discussions around needing fast CPU's to get
> the most out of SSD's. However I have never really ever seen an solid
> numbers to make a comparison about how much difference a faster CPU makes
> and if Ceph scales linearly with clockspeed. So I did a little experiment
> today.
>
> I setup a 1 OSD Ceph instance on a Desktop PC. The Desktop has a i5
> Sandbybridge CPU with the CPU turbo overclocked to 4.3ghz. By using the
> userspace governor in Linux, I was able to set static clock speeds to see
> the possible performance effects on Ceph. My pc only has an old X25M-G2
> SSD,
> so I had to limit the IO testing to 4kb QD=1, as otherwise the SSD ran out
> of puff when I got to the higher clock speeds.
>
> CPU Mhz 4Kb Write IO Min Latency (us) Avg Latency (us) CPU
> usr CPU sys
> 1600 797 886 1250
> 10.14 2.35
> 2000 815 746 1222
> 8.45 1.82
> 2400 1161 630 857
> 9.5 1.6
> 2800 1227 549 812
> 8.74 1.24
> 3300 1320 482 755
> 7.87 1.08
> 4300 1548 437 644
> 7.72 0.9
>
> The figures show a fairly linear trend right through the clock range and
> clearly shows the importance of having fast CPU's (Ghz not cores) if you
> want to achieve high IO, especially at low queue depths.
>
>
> Things to Note
> These figures are from a desktop CPU, no doubt Xeons will be slightly
> faster
> at the same clock speed
> I assuming using the userspace governor in this way is a realistic way to
> simulate different CPU clock speeds?
> My old SSD is probably skewing the figures slightly
> I have complete control over the turbo settings and big cooling, many
> server
> CPU's will limit the max turbo if multiple cores are under load or get too
> hot
> Ceph SSD OSD nodes are probably best with high end E3 CPU's as they have
> the
> highest clock speeds
> HDD's with Journals will probably benefit slightly from higher clock
> speeds,
> if the disk isn't the bottleneck (ie small block sequential writes)
> These numbers are for Replica=1, at 2 or 3 these numbers will be at least
> half I would imagine
>
>
> I hope someone finds this useful
>
> Nick
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com