Hi Alexander, We had a bunch of issues to fix in the master-worker and I also took the opportunity to take a look at your patch this morning.
I started to write a small regression test suite for the master-worker because it's kind of difficult to test it with our current vtest suite. I still need to finish it before pushing but that will help us with that kind of issues. I also agree that having an ascending order would be more logical, we broke that by accident with the 3.1 rework. I pushed your patch in master and will backport it in previous versions. On Thu, Mar 19, 2026 at 12:34:38PM +0000, Stephan, Alexander wrote: > Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc' > Hi, > > Did you have a chance to take a look yet? Thanks! > > Best, > > Alexander > > -----Original Message----- > From: Stephan, Alexander > Sent: Friday, February 20, 2026 10:11 AM > To: [email protected] > Subject: FW: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc' > > Now with list in CC, sry! > > -----Original Message----- > From: Stephan, Alexander > Sent: Thursday, February 19, 2026 2:22 PM > To: 'William Lallemand' <[email protected]> > Cc: Willy Tarreau <[email protected]> > Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc' > > Hi William and Willy, > > First, thanks for the backports! However, I noticed an issue with the > backport on stable branches <= 3.0 where the entries get re-emitted after 198 > processes. > > The final proc_list ordering differs between the versions: > > <= 3.0: In haproxy.c main(), the new worker is appended to proc_list first > (before mworker_env_to_proc_list() is called), so deserialized old workers > end up after the current worker. > This would mean old (LEAVING) workers appear in the list in descending reload > order (newest old worker first). > > >= 3.1 (current master): mworker_env_to_proc_list() runs early in init > >(haproxy.c:3311), appending all deserialized entries first. > Then mworker_prepare_master() (haproxy.c:3350) appends the new worker last. > Old workers are therefore in ascending reload order (oldest first). > > The original pagination fix (commit 594408c) used child->reloads >= > ctx->next_reload to skip already-printed entries on resume. > This comparison is direction-dependent — it only works correctly when the > list is in one specific order. > On branches where the order is reversed (i.e. <= 3.0, so effectively 2.9, > 3.0, since the other patch was only backported to this version), it loops > back to the beginning after ~198 processes. > > A version-specific fix, applicable to 2.9 and 3.0 would be trivial (flip >= > to <), created from 3.0.16 in this case: > > diff --git a/src/mworker.c b/src/mworker.c index 355e015..fb830fc 100644 > --- a/src/mworker.c > +++ b/src/mworker.c > @@ -638,8 +638,8 @@ static int cli_io_handler_show_proc(struct appctx *appctx) > if (up <= 0) /* must never be negative because of > clock drift */ > up = 0; > > - /* If we're resuming, skip entries that were already > printed (reload >= ctx->next_reload) */ > - if (ctx->next_reload && child->reloads >= > ctx->next_reload) > + /* If we're resuming, skip entries that were already > printed (reload < ctx->next_reload) */ > + if (ctx->next_reload && child->reloads < > ctx->next_reload) > continue; > > if (!(child->options & PROC_O_TYPE_WORKER)) > -- > 2.43.7 > > But as we've seen, this approach is fragile and will depend on the specific > HAProxy version. > From what I understood, the backported commits should be apply to all target > versions as is. > > So, instead of a directional comparison, the new approach: > > I renamed ctx->next_reload to ctx->resume_reload and changed its semantics: > it now stores the reload count of the last successfully flushed row, not the > failed one. > On flush failure, the value is left unchanged, so the failed row is replayed > on the next call. > On resume, walk the list and skip entries until we find the one whose > child->reloads == skip, then resume from the next entry. > This works identically whether the list is ascending or descending. > > For correctness, we also need to handle the deletion of the marker entry > (e.g., a worker process exits and SIGCHLD removes it from proc_list between > handler calls). > Here, track the previous LEAVING entry's reload count during the skip phase. > If two consecutive LEAVING entries straddle the skip value (one has reloads > > skip, the other has reloads < skip, or vice versa), the deleted entry's > former position has been crossed, so skipping stops and the current entry is > emitted. > > This makes the pagination completely direction-agnostic — it doesn't matter > whether old workers are in ascending or descending order in proc_list. > > IMO, it should be backported to all stable branches. On branches where > proc_list happens to be in descending order (2.9, 3.0), the fix works the > same way since the skip logic no longer depends on list direction. > > What do you think of this? In general, I would find a stable ascending order > in >= 3.1 that is consistent to 3.0 ideal. This would also prevent tools that > use the output from breaking. > But I am not sure about the implications of changing this again. > As usual, I attach my patch to make sure my email client doesn't mess with it. > > Best regards, > > Alexander > > > -----Original Message----- > From: William Lallemand <[email protected]> > Sent: Wednesday, January 28, 2026 3:42 PM > To: Stephan, Alexander <[email protected]> > Cc: [email protected] > Subject: Re: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc' > > Hello Alexander, > > On Mon, Jan 26, 2026 at 04:16:56PM +0000, Stephan, Alexander wrote: > > Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc' > > Hi William, > > > > I have been investigating the backport situation a bit and I wanted to > > quickly share some findings that might be useful. > > > > I backported the patches to version 3.2, but as you already observed, there's > a lot of changes before this version. The "program" section still exists in > these versions and the code was not written for it. > > I backported a few patches to do the backport as far as 3.0 but I don't think > it's worth it going further given the impact of the problem. We probably > don't want to mess with a feature which works for 99% of usecases in an older > branch. > > > So, I noticed a difference regarding the process list before version 3.0, > > which would need to be accounted for. > > "show proc" still uses the default parser here, which leaves > > appctx->svcctx uninitialized: > > https://gith/ > > ub.com%2Fhaproxy%2Fhaproxy%2Fblob%2Ff76e73511addd075f556449b0ebf33c9f7 > > a5184b%2Fsrc%2Fmworker.c%23L803C100-L803C117&data=05%7C02%7Calexander. > > stephan%40sap.com%7C44fb7a032eec4a725fec08de5e7b6d62%7C42f7676cf455423 > > c82f6dc2d99791af7%7C0%7C0%7C639052081403606857%7CUnknown%7CTWFpbGZsb3d > > 8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiT > > WFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=P%2B%2Foi%2BYVifCU82eQOCUcC > > VoPQ5SGh0d8wd74M%2BeBBvI%3D&reserved=0 > > Therefore, additional context would be needed like in the patch below where > > also the custom parser method needs to be copied from master. > > > > Furthermore, from what I can see the process list works a bit > > differently when it comes to reloads, using a min counter: > > https://gith/ > > ub.com%2Fhaproxy%2Fhaproxy%2Fblob%2Ff76e73511addd075f556449b0ebf33c9f7 > > a5184b%2Fsrc%2Fmworker.c%23L174&data=05%7C02%7Calexander.stephan%40sap > > .com%7C44fb7a032eec4a725fec08de5e7b6d62%7C42f7676cf455423c82f6dc2d9979 > > 1af7%7C0%7C0%7C639052081403631160%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1h > > cGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIj > > oyfQ%3D%3D%7C0%7C%7C%7C&sdata=xitlobHSOmFfjcC6uNTAvExXTMuptzybtemtOcqi > > vD4%3D&reserved=0 With the 1:1 logic from master, I again run into > > duplicates and messed up ordering in the the "show proc" output. > > > > I had success fixing this by adjusting the iteration, replacing > > ctx->next_reload = child->reloads with ctx->next_reload = > > child->reloads + 1; and replace > > > > if (ctx->next_reload && child->reloads >= ctx->next_reload) > > continue; > > with > > > > if (ctx->next_reload && child->reloads < ctx->next_reload) > > continue; > > > > in the old worker part. I appended the resulting patch which directly > > applies to 3.0-dev13 (just as an example, not fully ready for a commit yet). > > > > With this, it works very well for me for the backport targets 2.8 and 3.0 > > (deferred from the issue https://github.com/haproxy/haproxy/issues/3204). > > > > I hope this helps, I am also curious whether you agree with my approach > > here. > > > > Thanks in advance, and best regards, > > Thank you for the investigating the problem. Regarding your patch, we try to > backport the patches instead of writing new ones in previous branches. So for > the 3.0 branch I backported these ones: > > e41829dda6 BUG/MINOR: mworker/cli: fix show proc pagination using reload > counter > 9cae4b8f28 BUG/MINOR: mworker/cli: 'show proc' is limited by buffer size > 48d20c79e7 MINOR: mworker/cli: 'show proc debug' for old workers 5b851f94af > MINOR: mworker/cli: remove comment line for program when useless 948a4c372b > MINOR: mworker/cli: add 'debug' to 'show proc' > d55d3c6225 CLEANUP: mworker/cli: remove useless variable > > These are minor changes that are not that intrusive for a backport, > backporting without them would change too much the original patches, so I > took them in 3.0. > > Regards, > > -- > William Lallemand -- William Lallemand

