On Wed, Apr 08, 2026 at 11:45:23PM +0200, Mark Kettenis wrote:
> > Date: Wed, 8 Apr 2026 19:24:56 +0200
> > From: Jeremie Courreges-Anglas <[email protected]>
> > We have proof that the system doesn't necessarily crash after that
> > message is printed. kmos tested the db_enter removal yesterday
> > and confirmed that he got the message on the console without the
> > system crashing. Using the diff below, I got this today on my LDOM's
> > console:
> > Apr 8 11:37:26 ports /bsd: ctx_free: context 1641 still active in dmmu
> > Apr 8 12:21:12 ports /bsd: ctx_free: context 7896 still active in dmmu
> > Apr 8 12:24:29 ports /bsd: ctx_free: context 3150 still active in dmmu
> > Apr 8 13:43:56 ports /bsd: ctx_free: context 4221 still active in dmmu
> > Apr 8 15:55:50 ports /bsd: ctx_free: context 1264 still active in dmmu
> > Apr 8 18:55:48 ports /bsd: ctx_free: context 5664 still active in dmmu
> Sorry, but this is really bad. It means stale TSB entries have been
> left behind and may be re-used when the context is re-used. And that
> could lead to some serious memory corruption.
> If we want to paper over this issue, we should at least invalidate the
> stale TSB entry. So something like:
> for (i = 0; i < TSBENTS; i++) {
> tag = READ_ONCE(&tsb_dmmu[i].tag);
> if (TSB_TAG_CTX(tag) == oldctx) {
> atomic_cas_ulong(&tsb_dmmu[i].tag, tag,
> TSB_TAG_INVALID);
> printf("ctx_free: context %d still active in dmmu\n",
> oldctx);
> }
> tag = READ_ONCE(&tsb_immu[i].tag);
> if (TSB_TAG_CTX(tag) == oldctx) {
> atomic_cas_ulong(&tsb_dmmu[i].tag, tag,
> TSB_TAG_INVALID);
> printf("ctx_free: context %d still active in immu\n",
> oldctx);
> }
> }
I'd definitely prefer something other than the existing db_enter
"solution". My last full build I had to restart LDOMs at least 8 times
over the course of the 5 day build. Every few times requires dropping to
single user mode for manual fsck to repair filesystems.
The current build is being run with jca's proposed patch and within 4
hours of starting the build, one of the LDOMs already had this on its
console:
ctx_free: context 4575 still active in dmmu
That's before getting to the really memory intensive parts of the
package builds as the heavy C++ and rust builds have to wait for
ports-gcc to finish building some 8-10 hours in.
--Kurt