Hi, Am Freitag, 29. Mai 2026, 17:58:21 Mitteleuropäische Sommerzeit schrieb MidG971: > From: Midgy BALON <[email protected]> > > The RK3568 has a single NVDLA-derived NPU core (0.8 TOPS), the same IP > family as the three-core RK3588 NPU already supported by the Rocket > driver. To accommodate both SoCs: > > - Introduce a per-SoC rocket_soc_data structure carrying dma_bits and > an optional noc_init callback, plumbed through of_device_get_match_data(). > - rocket_device_init() now scans for both rk3568 and rk3588 RKNN cores > and picks the narrower DMA width (32-bit) when an RK3568 core is present. > - Add rk3568_soc_data and rk3568_noc_init() handling the three RK3568- > specific initialisation steps that must run after the power domain is > on and clocks are enabled:
if you need bullet points to describe your patch, that strongly indicates these need to be multple patches. I.e. the move of the relevant parts to a per-soc data is one patch (and only having the rk3588 soc-data in that one). > > 1. PVTPLL initialisation: The NPU uses a PVTPLL ring oscillator > managed by TF-A via SCMI for rates above 400 MHz. A two-step > clk_set_rate() sequence (600 MHz then 1 GHz) forces two SCMI calls > to TF-A even if the kernel clock framework would skip an unchanged > rate. The PVTPLL must be running before the NPU NOC bus will > acknowledge a de-idle request. > > 2. Explicit NPU power-on (PWR_GATE_SFTCON): The RK3568_PD_NPU power > domain is marked always_on in pm-domains.c, so the generic power > domain framework power_on() callback is a no-op. The NPU hardware > can remain power-gated at boot. Writing bit 1 = 0 to PWR_GATE_SFTCON > (PMU offset 0xa0) explicitly powers on the NPU hardware before the > de-idle request is issued. > > 3. NOC bus de-idle: Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2), > request de-idle (BUS_IDLE_SFTCON0 bit 2 = 0), then poll > BUS_IDLE_ST (PMU offset 0x60) until bit 2 clears (bus active). > > The RK3568 DMA address space is limited to 32 bits, as the NPU AXI bus > and IOMMU page walker cannot address memory above 4 GB. > > All PMU accesses follow the RK3568 write-mask protocol: upper 16 bits are > the write-enable mask for the lower 16 bits. > > Signed-off-by: Midgy BALON <[email protected]> [...] > diff --git a/drivers/accel/rocket/rocket_device.c > b/drivers/accel/rocket/rocket_device.c > index 46e6ee1e7..0ed8251c8 100644 > --- a/drivers/accel/rocket/rocket_device.c > +++ b/drivers/accel/rocket/rocket_device.c > @@ -27,6 +27,9 @@ struct rocket_device *rocket_device_init(struct > platform_device *pdev, > ddev = &rdev->ddev; > dev_set_drvdata(dev, rdev); > > + for_each_compatible_node(core_node, NULL, "rockchip,rk3568-rknn-core") > + if (of_device_is_available(core_node)) > + num_cores++; > for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") > if (of_device_is_available(core_node)) > num_cores++; > @@ -37,9 +40,25 @@ struct rocket_device *rocket_device_init(struct > platform_device *pdev, > > dma_set_max_seg_size(dev, UINT_MAX); > > - err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40)); > - if (err) > - return ERR_PTR(err); for both changes in rocket_device_init(): rocket_device_init() gets called from the main probe function, so before calling rocket_device_init() you can already access the specific soc data from the compatible and can derive both that for_each above, and the dma- width, directly from that. No need for that loop below. > + /* Use the DMA width of the first available RKNN core. RK3568 cores > + * are 32-bit; RK3588 are 40-bit. If both are present we pick the > + * narrower mask. > + */ > + { > + struct device_node *n; > + unsigned int dma_bits = 40; > + > + for_each_compatible_node(n, NULL, "rockchip,rk3568-rknn-core") > + if (of_device_is_available(n)) { > + dma_bits = 32; > + of_node_put(n); > + break; > + } > + > + err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(dma_bits)); > + if (err) > + return ERR_PTR(err); > + } Heiko
