On 21 Jul 2014, at 3:09 pm, Vladislav Bogdanov <[email protected]> wrote:
> 21.07.2014 06:21, Andrew Beekhof wrote: >> >> On 18 Jul 2014, at 5:16 pm, Vladislav Bogdanov <[email protected]> wrote: >> >>> Hi Andrew, all, >>> >>> I have a task which seems to be easily solvable with the use of >>> globally-unique clone: start huge number of specific virtual machines to >>> provide a load to a connection multiplexer. >>> >>> I decided to look how pacemaker behaves in such setup with Dummy >>> resource agent, and found that handling of every instance in an >>> "initial" transition (probe+start) slows down with increase of clone-max. >> >> "yep" >> >> for non unique clones the number of probes needed is N, where N is the >> number of nodes. >> for unique clones, we must test every instance and node combination, or N*M, >> where M is clone-max. >> >> And that's just the running of the probes... just figuring out which nodes >> need to be >> probed is incredibly resource intensive (run crm_simulate and it will be >> painfully obvious). >> >>> >>> F.e. for 256 instances transition took 225 seconds, ~0.88s per instance. >>> After I added 768 more instances (set clone-max to 1024) How many nodes though? Assuming 3, thats still only ~1s per operation (including the time taken to send the operation across the network twice and update the cib). >>> together with >>> increasing batch-limit to 512, transition took almost an hour (3507 >>> seconds), or ~4.57s per added instance. Even if I take in account that >>> monitoring of already started instances consumes some resources, last >>> number seems to be rather big, > > I believe this ^ is the main point. > If with N instances probe/start of _each_ instance takes X time slots, > then with 4*N instances probe/start of _each_ instance takes ~5*X time > slots. In an ideal world, I would expect it to remain constant. Unless you have 512 cores in the cluster, increasing the batch-limit in this way is certainly not going to give you the results you're looking for. Firing more tasks at a machine just ends up in producing more context switches as the kernel tries to juggle the various tasks. More context switches == more CPU wasted == more time taken overall == completely consistent with your results. > Otherwise we have an issue with scalability into this direction. > >>> >>> Main CPU consumer on DC while transition is running is crmd, Its memory >>> footprint is around 85Mb, resulting CIB size together with the status >>> section is around 2Mb, >> >> You said CPU and then listed RAM... > > Something wrong with that? :) > That just three distinct facts. I was expecting quantification of the relative CPU usage. I was also expecting the PE to have massive spikes whenever a new transition is calculated. > >> >>> >>> Could it be possible to optimize this use-case from your opinion with >>> minimal efforts? Could it be optimized with just configuration? Or may >>> it be some trivial development task, f.e replace one GList with >>> GHashtable somewhere? >> >> Optimize: yes, Minimal: no >> >>> >>> Sure I can look deeper and get any additional information, f.e. to get >>> crmd profiling results if it is hard to get an answer just from the head. >> >> Perhaps start looking in clone_create_probe() > > Got it, thanks for pointer! > >> >>> >>> Best, >>> Vladislav >>> >>> _______________________________________________ >>> Pacemaker mailing list: [email protected] >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
