On 8/29/25 1:16 PM, Alexandre Courbot wrote:
On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
On 8/26/25 6:07 AM, Alexandre Courbot wrote:
   /// Structure encapsulating the firmware blobs required for the GPU to 
operate.
   #[expect(dead_code)]
   pub(crate) struct Firmware {
@@ -36,7 +123,10 @@ pub(crate) struct Firmware {
       booter_unloader: BooterFirmware,
       /// GSP bootloader, verifies the GSP firmware before loading and running 
it.
       gsp_bootloader: RiscvFirmware,
-    gsp: firmware::Firmware,
+    /// GSP firmware.
+    gsp: Pin<KBox<GspFirmware>>,

Is there a reason why we don't just propagate it through struct Gpu, which uses
pin-init already?

You can make Firmware pin_data too and then everything is within the single
allocation of struct Gpu.

Thanks a lot for the write-up below!

I tried doing that at first, and hit the problem that the `impl PinInit`
returned by `GspFirmware::new` borrows a reference to the GSP firmware
binary loaded by `Firmware::new` - when `Firmware::new` returns, the
firmware gets freed, and the borrow checker complains.

We could move the GSP firmware loading code into the `pin_init!` of
`Firmware::new`, but now we hit another problem: in `Gpu::new` the
following code is executed:

     FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?

which requires the `Firmware` instance, which doesn't exist yet as the
`Gpu` object isn't initialized until the end of the method.

So we could move `FbLayout`, and everything else created by `Gpu::new`
to become members of the `Gpu` instance. It does make sense actually:
this `new` method is doing a lot of stuff, such as running FRTS, and
with Alistair's series it even runs Booter, the sequencer and so on.
Maybe we should move all firmware execution to a separate method that is
called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
constructor looks like it does a bit more than it should.

Absolutely, executing the firmware should be a separate method. Having it in the
constructor makes things more difficult.
... but even when doing that, `Firmware::new` and `FbLayout::new` still
require a reference to the `Bar`, and... you get the idea. :)

Lifetime wise this should be fine, the &Bar out-lives the constructor, since
it's lifetime is bound to the &Device<Bound> which lives for the entire duration
of probe().
So I don't think the current design allows us to do that easily or at
all, and even if it does, it will be at a significant cost in code
clarity. There is also the fact that I am considering making the
firmware member of `Gpu` a trait object: the boot sequence is so
different between pre and post-Hopper that I don't think it makes sense
to share the same `Firmware` structure between the two. I would rather
see `Firmware` as an opaque trait object, which provides high-level
methods such as "start GSP" behind which the specifics of each GPU
family are hidden. If we go with this design, `Firmware` will become a
trait object and so cannot be pinned into `Gpu`.

This doesn't change my observation that `Gpu::new` should not IMHO do
steps like booting the GSP - it should just acquire the resources it
needs, return the pinned GPU object, and then `probe` can continue the
boot sequence. Having the GPU object pinned and constructed early
simplifies things quite a bit as the more we progress with boot, the
harder it becomes to construct everything in place (and the `PinInit`
closure also becomes more and more complex).

I'm still laying down the general design, but I'm pretty convinced that
having `Firmware` as a trait object is the right way to abstract the
differences between GPU families.

Makes sense, it's fine with me to keep this in its separate allocation for the
purpose of making Firmware an opaque trait object, which sounds reasonable.

But we should really properly separate construction of the GPU structure from
firmware boot code execution as you say. And actually move the construction of
the GPU object into try_pin_init!().

Reply via email to