>>> Jan Pokorný <[email protected]> schrieb am 18.02.2019 um 21:08 in Nachricht <[email protected]>: > On 15/02/19 08:48 +0100, Jan Friesse wrote: >> Ulrich Windl napsal(a): >>> IMHO any process running at real-time priorities must make sure >>> that it consumes the CPU only for short moment that are really >>> critical to be performed in time. > > Pardon me, Ulrich, but something is off about this, especially > if meant in general. > > Even if the infrastructure of the OS was entirely happy with > switching scheduling parameters constantly and in the furious rate > (I assume there may be quite a penalty when doing so, in the overhead > caused with reconfiguration of the schedulers involved), the > time-to-proceed-critical sections do not appear to be easily divisible > (if at all) in the overall code flow of a singlethreaded program like > corosync, since everything is time-critical in a sense (token and > other timeouts are ticking), and offloading some side/non-critical > tasks for asynchronous processing is likely not on the roadmap for > corosync, given the historical move from multithreading (only retained > for logging for which an extra precaution is needed so as to prevent > priority inversion, which will generally always be a threat when > unequal priority processes do interface, even if transitively).
That's what I'm talking about: If you let users pick the priority, every user will pick the highest possible priority, because he/she expects to get better service. I n fact they don't. And for real-time scheduling it can even halt the system. For corosync this means that RT priority should be really used where it's actually needed, not just to cope with poor performance. > > The step around multithreading is to have another worker process > with IPC of some sorts, but with that, you only add more overhead > and complexity around such additionally managed queues into the > game (+ possibly priority inversion yet again). Yes, a real-time system has to be designed with real-time in mind all the time; you can't make any system a realtime system just by using real-time scheduling priorities. > > BTW. regarding "must make sure" part, barring self-supervision > of any sort (new complexity + overhead), that's a problem of > fixed-priority scheduling assignment. I've been recently raising > an awareness of (Linux-specific) *deadline scheduler* [1,2], which: > > - has even higher hierarchical priority compared to SCHED_RR > policy (making the latter possibly ineffective, which would > not be very desirable, I guess) > > - may better express not only the actual requirements, but some > "under normal circumstances, using reasonably scoped HW for > the task" (speaking of hypothetical defaults now, possibly > user configurable and/or influenced with actual configured > timeouts at corosync level) upper boundary for how much of > CPU run-time shall be allowed for the process in absolute > terms, possibly preventing said livelock scenarios (being > throttled when exceeded, presumably speeding the loss of > the token and subsequent fencing up) > > Note that in systemd deployments, it would be customary for > the service launcher (unit file executor) to actually expose > this stuff as yet another user-customizable wrapping around > the actual run, but support for this very scheduling policy > is currently missing[3]. > >>> Specifically having some code that performs poorly (for various >>> reasons) is absolutely _not_ a candidate to be run with real-time >>> priorities to fix the bad performance! > > You've managed to flip (more-or-less, have no contrary evidence) > isolated occurrence of evidently buggy behaviour to a generalized > description of the performance of the involved pieces of SW. Actually (independent of this issue) I always had the impression that corosync is communicating too much (a lot of traffic while nothing s happening in the cluster), and it easily breaks under load. And I had the impression that developers tried to fix this by adding real-time priorities to the parts that expose the problem. Which is the wrong type of fix IMHO... > If that was that bad, we would hear there's not enough room for > the actual clustered resources all the time, but I am not aware > of that. Depends on what "room" actually refers to: Would corosync ever work reasonably on a single-CPU system? Yes that's poorly hypothetical, but there actually exists software that deadlocks with only one CPU... > > With buggy behaviour, I mean, logs from https://clbin.com/9kOUM > and https://github.com/ClusterLabs/libqb/commit/2a06ffecd bug fix > from the past seem to have something in common, like high load > as a surrounding circumstance, and the missed event/job (on, > presumably a socket, fd=15 in the log ... since that never gets > handled even when there's no other input event). Guess that > another look is needed at _poll_and_add_to_jobs_ function > (not sure why it's without leading/trailing underscore in the > provided gdb backtrace [snipped]: >>>> Thread 1 (Thread 0x7f6fd43c7b80 (LWP 16242)): >>>> #0 0x00007f6fd31c5183 in epoll_wait () from /lib64/libc.so.6 >>>> #1 0x00007f6fd3b3dea8 in poll_and_add_to_jobs () from /lib64/libqb.so.0 >>>> #2 0x00007f6fd3b2ed93 in qb_loop_run () from /lib64/libqb.so.0 >>>> #3 0x000055592d62ff78 in main () > ) and its use. > >>> So if corosync is using 100% CPU in real-time, this says something >>> about the code quality in corosync IMHO. > > ... or in any other library that's involved (primary suspect: libqb) > down to kernel level, and, keep in mind, no piece of nontrivial SW is > bug-free, especially if the reproducer requires rather a specific > environment that is not prioritized by anyone incl. those tasked > with quality assurance. Yes, but busy-waiting (using poll()) in a real-time task is always dangerous, especially if you do not have control over the events that are supposed to arrive. > >>> Also SCHED_RR is even more cooperative than SCHED_FIFO, and another >>> interesting topic is which of the 100 real-time priorities to >>> assign to which process. (I've written some C code that allows to >>> select the scheduling mechanism and the priority via command-line >>> argument, so the user and not the program is responsible if the >>> system locks up. Maybe corosync should thing about something >>> similar. >> >> And this is exactly why corosync option -p (-P) exists (in 3.x these >> were moved to corosync.conf as a sched_rr/priority). >> >>> Personally I also think that a program that sends megabytes of XML >>> as realtime-priority task through the network is broken by design: >>> If you care about response time, minimize the data and processing >>> required before using real-time priorities. > > This is partially done already (compressing big XML chunks) before > sending on the pacemaker side. The next reasonable step there would > be to move towards some of the nicely wrapped binary formats (e.g. > Protocol Buffers or FlatBuffers[4]), but it is a speculative long-term > direction, and core XML data interchange will surely be retained for > a long long time for compatibility reasons. Other than that, corosync > doesn't interpret transferred data, and conversely, pacemaker daemons > do not run with realtime priorities. Maybe an overall picture with tasks, priorities, responsibilities and communication paths will be helpful to understand. > >>>>> Edwin Török <[email protected]> 14.02.19 18.34 Uhr >>>> [...] >>>> >>>> This appears to be a priority inversion problem, if corosync runs >>>> as realtime then everything it needs (timers...) should be >>>> realtime as well, otherwise running as realtime guarantees we'll >>>> miss the watchdog deadline, instead of guaranteeing that we >>>> process the data before the deadline. > > This may not be an immediate priority inversion problem per se, but > (seemingly) a rare bug (presumably in libqb, see the other similar > one above) accented with the fixed-priority (only very lightly > upper-bounded) realtime scheduling and the fact this all somehow > manages to collide with as vital processes as those required for > an actual network packets delivery, IIUIC (yielding some > conclusions about putting VPNs etc. into the mix). > > Note sure if this class of problems in general would be at least > partially self-solved with a deadline (word used twice in the above > excerpt, out of curiousity) scheduling with some reasonable parameters. > >>>> [...] >>>> >>>> Also would it be possible for corosync to avoid hogging the CPU in >>>> libqb? > > ...or possibly (having no proof), for either side not to get > inconsistent event tracking, which may slow any further progress down > (if not preventing it), see the similar libqb issue referenced above. > >>>> (Our hypothesis is that if softirqs are not processed then timers >>>> wouldn't work for processes on that CPU either) > > Interesting. Anyway, thanks for sharing your observations. Regards, Ulrich > >>>> [...] > > [1] https://lwn.net/Articles/743740/ > [2] https://lwn.net/Articles/743946/ > [3] https://github.com/systemd/systemd/issues/10034 > [4] https://bugs.clusterlabs.org/show_bug.cgi?id=5376#c3 > > -- > Jan (Poki) _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
