> On 02 Sep 2015, at 17:50, Robert LeBlanc <[email protected]> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Thanks for the responses.
>
> I forgot to include the fio test for completeness:
>
> 8 job QD=8
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=8
> numjobs=8
> thread
> group_reporting
> time_based
> direct=1
>
>
> 1 job QD=1
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=1
> numjobs=1
> thread
> group_reporting
> time_based
> direct=1
>
> I have not disabled all of the power management, I've only prevented the CPU
> from going to an idle state below C1. I'll have to check on Jan's suggestion
> of swapping out the intel_idle driver to see what difference it makes. I did
> not run powertop as I did the testing because it (or cpupower monitor)
> impacted performance and would have thrown off the results. I'll do some runs
> with lower clocks and make sure that it is staying at the lower speeds. Here
> is some additional output:
AFAIK TurboBoost doesn't kick in unless some cores are in C2, someone should go
and take a look at the specs :-)
>
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> userspace
> # cpupower monitor
> |Nehalem || Mperf || Idle_Stats
> CPU | C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1-A | C6-A
> 0| 0.00| 94.19| 0.00| 0.00|| 5.70| 94.30| 1299|| 0.00| 0.00| 94.32
> 1| 0.00| 99.39| 0.00| 0.00|| 0.53| 99.47| 1298|| 0.00| 0.00| 99.48
> 2| 0.00| 99.60| 0.00| 0.00|| 0.38| 99.62| 1299|| 0.00| 0.00| 99.61
> 3| 0.00| 99.63| 0.00| 0.00|| 0.36| 99.64| 1299|| 0.00| 0.00| 99.64
> 4| 0.00| 99.84| 0.00| 0.00|| 0.11| 99.89| 1301|| 0.00| 0.00| 99.97
> 5| 0.00| 99.57| 0.00| 0.00|| 0.40| 99.60| 1299|| 0.00| 0.00| 99.61
> 6| 0.00| 99.72| 0.00| 0.00|| 0.27| 99.73| 1299|| 0.00| 0.00| 99.73
> 7| 0.00| 99.98| 0.00| 0.00|| 0.01| 99.99| 1321|| 0.00| 0.00| 99.99
> # cat /sys/devices/system/cpu/cpuidle/current_driver
> intel_idle
>
> I then echo "1" into /dev/cpu_dma_latency. We can see that the idle time
> moves from C6 to C1
>
This should not work. You need to leave the file descriptor open after writing
the value, it's not a sysfs/proc-type tunable.
> # cpupower monitor
> |Nehalem || Mperf || Idle_Stats
> CPU | C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1-A | C6-A
> 0| 0.00| 0.00| 0.00| 0.00|| 0.37| 99.63| 1299|| 0.00| 99.63| 0.00
> 1| 0.00| 0.00| 0.00| 0.00|| 0.16| 99.84| 1299|| 0.00| 99.84| 0.00
> 2| 0.00| 0.00| 0.00| 0.00|| 0.47| 99.53| 1299|| 0.00| 99.53| 0.00
> 3| 0.00| 0.00| 0.00| 0.00|| 0.43| 99.57| 1299|| 0.00| 99.57| 0.00
> 4| 0.00| 0.00| 0.00| 0.00|| 0.09| 99.91| 1300|| 0.00| 99.91| 0.00
> 5| 0.00| 0.00| 0.00| 0.00|| 0.06| 99.94| 1298|| 0.00| 99.94| 0.00
> 6| 0.00| 0.00| 0.00| 0.00|| 0.09| 99.91| 1300|| 0.00| 99.91| 0.00
> 7| 0.00| 0.00| 0.00| 0.00|| 0.28| 99.72| 1299|| 0.00| 99.72| 0.00
> # cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
> 0
> 2
> 15
> # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_{min,max,cur}_freq
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 1200000
> 1200000
> 1200000
> 1600000
> 1200000
> 1200000
> 1200000
> 1200000
>
> Thanks for taking the time to collaborate with me on this.
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.0.2
> Comment: https://www.mailvelope.com <https://www.mailvelope.com/>
>
> wsFcBAEBCAAQBQJV5xrBCRDmVDuy+mK58QAAWaoP/2bIKlsp+fmlViP4pFV7
> Sv+y/1nCQdNs0l2AJdiDX2l7OQrYavDh5LldJBkcmTyB74KjDJ+i88VGYkdG
> n8Q6tTbF4erw8P/gPf3DIrvQazdQm+a/6rUBpkM+MNTRyKRczxeyCu8kCNzb
> jDP7erwnj0WzCZMAA1uFLa9sMKBNxOfpK9wQR5NbQCkOcsDtprNL2KPfxrFV
> Rgk0OBGBSLtz9BE/PMYpbeqr9o1nChCp4hkg5AUcFrAuceOKdA7R8lKPIUZ6
> 0zTL1OjGsGfy/sp856poqmF02bANF9LXzmcBMKBNMO0iS89xv0YyIgRBlt/Z
> lXc4M7IWtYzbbUVAtSLcOtWrzS8Yp0hMKlPrhA7LZFrhZ4+t45mvyrS3RbiP
> RG8osdvjz58ZBS7/jk1gDZd8Xbj5bsU3n01DTFJ3CeAE2etAqgheAGlj4OTR
> kfs/g1jbYArEgnfX3jTJ2wECjfVRTrgXJGjceoYtJYbQ4Ns/0dBWpZBrkEu0
> AX4VU1dk9R1B0rootvKsWedcKvof4cSOyKRtQxGHS7ipqtkyep+1JquO41mr
> cBC9p/TOXgh90M8476G1CpMqWwWHneHJ6bjO5V1W8uWGXTNFnaGbqS4v3mWk
> ge1qukr9et0Su0llUb8Rz3hCDqD6PfMJpquBTAB/kaanS+t0pi+00wxu7zzB
> zVQ/
> =v4sY
> -----END PGP SIGNATURE-----
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>
> On Wed, Sep 2, 2015 at 3:21 AM, Nick Fisk <[email protected]
> <mailto:[email protected]>> wrote:
> I think this may be related to what I had to do, it rings a bell at least.
>
> http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency
>
> <http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency>
>
> The P-state drive doesn't support userspace, so you need to disable it and
> make Linux use the old acpi drive instead.
>
> > -----Original Message-----
> > From: Nick Fisk [mailto:[email protected] <mailto:[email protected]>]
> > Sent: 01 September 2015 22:21
> > To: 'Robert LeBlanc' <[email protected] <mailto:[email protected]>>
> > Cc: [email protected] <mailto:[email protected]>
> > Subject: RE: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> >
> > > -----Original Message-----
> > > From: ceph-users [mailto:[email protected]
> > > <mailto:[email protected]>] On Behalf
> > > Of Robert LeBlanc
> > > Sent: 01 September 2015 21:48
> > > To: Nick Fisk <[email protected] <mailto:[email protected]>>
> > > Cc: [email protected] <mailto:[email protected]>
> > > Subject: Re: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA256
> > >
> > > Nick,
> > >
> > > I've been trying to replicate your results without success. Can you
> > > help me understand what I'm doing that is not the same as your test?
> > >
> > > My setup is two boxes, one is a client and the other is a server. The
> > > server has Intel(R) Atom(TM) CPU C2750 @ 2.40GHz, 32 GB RAM and 2
> > > Intel S3500
> > > 240 GB SSD drives. The boxes have Infiniband FDR cards connected to a
> > > QDR switch using IPoIB. I set up OSDs on the 2 SSDs and set pool
> > > size=1. I mapped a 200GB RBD using the kernel module ran fio on the
> > > RBD. I adjusted the number of cores, clock speed and C-states of the
> > > server and here are my
> > > results:
> > >
> > > Adjusted core number and set the processor to a set frequency using
> > > the userspace governor.
> > >
> > > 8 jobs 8 depth Cores
> > > 1 2 3 4 5 6 7 8
> > > Frequency 2.4 387 762 1121 1432 1657 1900 2092 2260
> > > GHz 2 386 758 1126 1428 1657 1890 2090 2232
> > > 1.6 382 756 1127 1428 1656 1894 2083 2201
> > > 1.2 385 756 1125 1431 1656 1885 2093 2244
> > >
> >
> > I tested at QD=1 as this tends to highlight the difference in clock speed,
> > whereas a higher queue depth will probably scale with both frequency and
> > cores. I'm not sure this is your problem, but to make sure your environment
> > is doing what you want I would suggest QD=1 and 1 job to start with.
> >
> > But thank you for sharing these results regardless of your current frequency
> > scaling issues. Information like this is really useful for people trying to
> > decide
> > on hardware purchases. Those Atom boards look like they could support 12x
> > normal HDD's quite happily, assuming 80 IOPsx12.
> >
> > I wonder if we can get enough data from various people to generate a
> > IOPs/CPU Freq for various CPU architectures?
> >
> >
> > > I then adjusted the processor to not go in a deeper sleep state than
> > > C1 and also tested setting the highest CPU frequency with the ondemand
> > governor.
> > >
> > > 1 job 1 depth
> > > Cores 1
> > > <=C1, feq range C0-C6, freq range C0-C6, static freq
> > > <=C1, static
> > > freq
> > > Frequency 2.4 381 381 379 381
> > > GHz 2 382 380 381 381
> > > 1.6 380 381 379 382
> > > 1.2 383 378 379 383
> > > Cores 8
> > > <=C1, feq range C0-C6, freq range C0-C6, static freq
> > > <=C1, static
> > > freq
> > > Frequency 2.4 629 580 584 629
> > > GHz 2 630 579 584 634
> > > 1.6 630 579 584 634
> > > 1.2 632 581 582 634
> > >
> > > Here I'm see a correlation between # cores and C-states, but not
> > frequency.
> > >
> > > Frequency was controlled with:
> > > cpupower frequency-set -d 1.2GHz -u 1.2GHz -g userspace and cpupower
> > > frequency-set -d 1.2GHz -u 2.0GHz -g ondemand
> > >
> > > Core count adjusted by:
> > > for i in {1..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > done
> > >
> > > C-states controlled by:
> > > # python
> > > Python 2.7.5 (default, Jun 24 2015, 00:41:19) [GCC 4.8.3 20140911 (Red
> > > Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or
> > > "license" for more information.
> > > >>> fd = open('/dev/cpu_dma_latency','wb')
> > > >>> fd.write('1')
> > > >>> fd.flush()
> > > >>> fd.close() # Don't run this until the tests are completed (the
> > > >>> handle has
> > > to stay open).
> > > >>>
> > >
> > > I'd like to replicate your results. I'd also like if you can verify
> > > some of mine in your set-up around C-States and cores.
> >
> > I can't remember exactly, but I think I had to do something to get the
> > userspace governor to behave as I expected it to. I tend to recall setting
> > the
> > frequency low and yet still seeing it bursting up to max. I will have a look
> > through my notes tomorrow and see if I can recall anything. One thing I do
> > remember though is that the Intel powertop utility was very useful in
> > confirming what the actual CPU frequency was. It might be worth installing
> > and running this and seeing what the CPU cores are doing.
> >
> >
> > >
> > > Thanks,
> > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: Mailvelope v1.0.2
> > > Comment: https://www.mailvelope.com <https://www.mailvelope.com/>
> > >
> > >
> > wsFcBAEBCAAQBQJV5g8GCRDmVDuy+mK58QAAe6YP/j+SNGFI2z7ndnbOk87
> > > D
> > > UjxG+hiZT5bkdt2/wVfI6QiH0UGDA3rLBsttOHPgfxP6/CEy801q8/fO0QOk
> > > tLxIgX01K4ECls2uhiFAM3bhKalFsKDM6rHYFx96tIGWonQeou36ouDG8pfz
> > > YsprvQ2XZEX1+G4dfZZ4lc3A3mfIY6Wsn7DC0tup9eRp3cl9hQLXEu4Zg8CZ
> > > 7867FNaud4S4f6hYV0KUC0fv+hZvyruMCt/jgl8gVr8bAdNgiW5u862gsk5b
> > > sO9mb7H679G8t47m3xd89jTh9siMshbcakF9PXKzrN7DxBb/sBuN3GykesZA
> > > +5jdUTzPCxFu+LocJ91by8FybatpLwxycmfP2gRxd/owclXk5BqqJUnrdYVm
> > >
> > n2GcHobdHVv9k/s+iBVV0xbwqOY+IO9UNUfLAKNy7E1xtpXdTpQBuokmu/4D
> > >
> > WXg3C4u+DsZNvcziO4s/edQ1koOQm1Fcj5VnbouSqmsHpB5nHeJbGmiKNTB
> > > A
> > > 9pE/hTph56YRqOE3bq3X/ohjtziL7/e/MVF3VUisDJieaLxV9weLxKIf0W9t
> > > L7NMhX7iUIMps5ulA9qzd8qJK6yBa65BVXtk5M0A5oTA/VvxHQT6e5nSZS+Z
> > >
> > WLjavMnmSSJT1BQZ5GkVbVqo4UVjndcXEvkBm3+McaGKliO2xvxP+U3nCKpZ
> > > js+h
> > > =4WAa
> > > -----END PGP SIGNATURE-----
> > >
> > >
> > > ----------------
> > > Robert LeBlanc
> > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
> > >
> > > On Sat, Jun 13, 2015 at 8:58 AM, Nick Fisk <[email protected]
> > > <mailto:[email protected]>> wrote:
> > > Hi All,
> > >
> > > I know there has been lots of discussions around needing fast CPU's to
> > > get the most out of SSD's. However I have never really ever seen an
> > > solid numbers to make a comparison about how much difference a faster
> > > CPU makes and if Ceph scales linearly with clockspeed. So I did a
> > > little experiment today.
> > >
> > > I setup a 1 OSD Ceph instance on a Desktop PC. The Desktop has a i5
> > > Sandbybridge CPU with the CPU turbo overclocked to 4.3ghz. By using
> > > the userspace governor in Linux, I was able to set static clock speeds
> > > to see the possible performance effects on Ceph. My pc only has an old
> > > X25M-G2 SSD, so I had to limit the IO testing to 4kb QD=1, as
> > > otherwise the SSD ran out of puff when I got to the higher clock
> > > speeds.
> > >
> > > CPU Mhz 4Kb Write IO Min Latency (us) Avg Latency (us)
> > > CPU
> > > usr CPU sys
> > > 1600 797 886 1250
> > > 10.14 2.35
> > > 2000 815 746 1222
> > > 8.45 1.82
> > > 2400 1161 630 857
> > > 9.5 1.6
> > > 2800 1227 549 812
> > > 8.74 1.24
> > > 3300 1320 482 755
> > > 7.87 1.08
> > > 4300 1548 437 644
> > > 7.72 0.9
> > >
> > > The figures show a fairly linear trend right through the clock range
> > > and clearly shows the importance of having fast CPU's (Ghz not cores)
> > > if you want to achieve high IO, especially at low queue depths.
> > >
> > >
> > > Things to Note
> > > These figures are from a desktop CPU, no doubt Xeons will be slightly
> > > faster at the same clock speed I assuming using the userspace governor
> > > in this way is a realistic way to simulate different CPU clock speeds?
> > > My old SSD is probably skewing the figures slightly I have complete
> > > control over the turbo settings and big cooling, many server CPU's
> > > will limit the max turbo if multiple cores are under load or get too
> > > hot Ceph SSD OSD nodes are probably best with high end E3 CPU's as
> > > they have the highest clock speeds HDD's with Journals will probably
> > > benefit slightly from higher clock speeds, if the disk isn't the
> > > bottleneck (ie small block sequential writes) These numbers are for
> > > Replica=1, at 2 or 3 these numbers will be at least half I would
> > > imagine
> > >
> > >
> > > I hope someone finds this useful
> > >
> > > Nick
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected] <mailto:[email protected]>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com