On Mon, 21 Jul 2025, Warner Losh wrote:

Hi Warner,

On Mon, Jul 21, 2025 at 11:38 AM Warner Losh <[email protected]> wrote:

So, We're getting through the 'reset' and 'identify' states in the
state machine and entering the 'power off' state. And then progressing
no further. So we get into the 'done' routine, but don't progress to
sending the get ocr command. That's why we still have the probe
routine. We're not setting the probe into INVALID state, so we should
be calling xpt_schedule() to do the next single step of this process
at the end of mmc_done(), but that doesn't trigger a new call to
mmc_start().

I did some CAM cleanups that shouldn't have broken this, but might
have (low probability, but with cam you never know, especially in the
single step phase we do to do the probing). I'll check those out.
Maybe it's easy.

OK. It turned out that I had messed things up, unbeknownst to me. I've
added a KASSERT that should catch problems like this in the future
(and caught a second one that I also fixed). It should be safe to go
back into the MMCCAM water after b4b166b8c46b8. It broke a few days
ago in c6dc5d367681 (I just realised I forgot to add the fixes: tag to
my commit in my rush to get a fix in).

When we're running the single stepping engine to probe the device, all
those commands are queued. And we'll not run through them if we queue
something at CAM_PRIORITY_NONE. c6dc5d367681 made the ccb that ran for
the xpt_path_inq() at the NONE priority. Normally, this is fine. All
the instances in the tree but two use that the stack to pass in a CCB
for this use. It doesn't matter the priority. But, xpt_start() did
xpt_path_inq on the start_ccb passed in that was allocated in
xpt_schedule which fills in the proper priority and other fields. It
didn't matter for the xpt_path_inq, it completed just fine. However,
for the XPT_MMC_IO commands that followed, the priority was wrong, so
they'd never run. It turns out that we don't need or use the
xpt_path_inq() results at all, so it was just having the side effect
of initializing the CCB (bogusly and redundantly, it turns out. The
other location that doesn't use the stack for xpt_path_inq() doesn't
really reuse the CCB, but saves the results of the path_inq away (it
could allocate less memory if it did a similar thing to the stack
trick in its soft state structure, but doesn't).

So tl;dr: Thanks for the heads up, this broke only a few days ago, and
should be fixed now. My sdhci system with eMMC in it works with these
changes. And I added a KASSERT to catch this problem in the future.

I can confirm that your fixes are working.

I had a look through the cam diff but wouldn't have spotted this.

I can also confirm that the KASSERT works:

panic: xpt_action: queued ccb and CAM_PRIORITY_NONE illegal.

Now to fix my own code ;-)


Thanks a lot for the investigation and quick fix!  Very much appreciated!
/bz

--
Bjoern A. Zeeb                                                     r15:7

Reply via email to