On 20-05-2026 21:17, Tomeu Vizoso wrote:
> On Wed, May 20, 2026 at 4:12 PM Dmitry Baryshkov
> <[email protected]> wrote:
>>
>> On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote:
>>> From: Ekansh Gupta <[email protected]>
>>>
>>> Add documentation for the Qualcomm DSP Accelerator (QDA) driver under
>>> Documentation/accel/qda/. The documentation covers the driver
>>> architecture, GEM-based buffer management, IOMMU context bank
>>> isolation, and the RPMsg transport layer.
>>>
>>> The user-space API section describes the DRM IOCTLs for session
>>> management, GEM buffer allocation, and remote procedure invocation via
>>> the FastRPC protocol, along with a typical application lifecycle
>>> example. Sections for dynamic debug and basic testing are also
>>> included.
>>>
>>> Wire the new documentation into the Compute Accelerators index at
>>> Documentation/accel/index.rst.
>>>
>>> Assisted-by: Claude:claude-4-6-sonnet
>>> Signed-off-by: Ekansh Gupta <[email protected]>
>>> ---
>>>  Documentation/accel/index.rst     |   1 +
>>>  Documentation/accel/qda/index.rst |  13 ++++
>>>  Documentation/accel/qda/qda.rst   | 146 
>>> ++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 160 insertions(+)
>>>
>>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>>> index cbc7d4c3876a..5901ea7f784c 100644
>>> --- a/Documentation/accel/index.rst
>>> +++ b/Documentation/accel/index.rst
>>> @@ -10,4 +10,5 @@ Compute Accelerators
>>>     introduction
>>>     amdxdna/index
>>>     qaic/index
>>> +   qda/index
>>>     rocket/index
>>> diff --git a/Documentation/accel/qda/index.rst 
>>> b/Documentation/accel/qda/index.rst
>>> new file mode 100644
>>> index 000000000000..013400cf9c25
>>> --- /dev/null
>>> +++ b/Documentation/accel/qda/index.rst
>>> @@ -0,0 +1,13 @@
>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +==================================
>>> +accel/qda Qualcomm DSP Accelerator
>>> +==================================
>>> +
>>> +The QDA driver provides a DRM accel based interface for Qualcomm DSP 
>>> offload.
>>> +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure
>>> +for device and buffer management.
>>> +
>>> +.. toctree::
>>> +
>>> +   qda
>>> diff --git a/Documentation/accel/qda/qda.rst 
>>> b/Documentation/accel/qda/qda.rst
>>> new file mode 100644
>>> index 000000000000..9f49af6e6acc
>>> --- /dev/null
>>> +++ b/Documentation/accel/qda/qda.rst
>>> @@ -0,0 +1,146 @@
>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +=====================================
>>> +Qualcomm DSP Accelerator (QDA) Driver
>>> +=====================================
>>> +
>>> +Introduction
>>> +============
>>> +
>>> +The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a
>>> +DRM accel based interface for Qualcomm DSP offload, supporting workloads
>>> +such as AI inference, computer vision, audio processing, and sensor offload
>>> +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and
>>> +GEM infrastructure for device and buffer management.
>>> +
>>> +Key Features
>>> +============
>>> +
>>> +*   **DRM accel Interface**: Exposes a standard character device node
>>> +    (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem.
>>> +*   **FastRPC Protocol**: Implements the FastRPC protocol for communication
>>> +    between the application processor and the DSP.
>>> +*   **GEM Buffer Management**: Uses the DRM GEM interface for buffer
>>> +    allocation, lifecycle management, and DMA-BUF import/export.
>>> +*   **IOMMU Isolation**: Uses IOMMU context banks to enforce memory 
>>> isolation
>>> +    between different DSP user sessions.
>>> +*   **Modular Design**: Clean separation between the core DRM logic, the
>>> +    memory manager, and the RPMsg-based transport layer.
>>> +
>>> +Architecture
>>> +============
>>> +
>>> +The QDA driver consists of several functional blocks:
>>> +
>>> +1.  **Core Driver (``qda_drv``)**: Manages device registration, file 
>>> operations,
>>> +    and DRM accel integration.
>>> +2.  **Memory Manager (``qda_memory_manager``)**: A flexible memory 
>>> management
>>> +    layer that handles IOMMU context banks. It supports pluggable backends
>>> +    (such as DMA-coherent) to adapt to different SoC memory architectures.
>>> +3.  **GEM Subsystem**: Implements the DRM GEM interface for buffer 
>>> management:
>>> +
>>> +    * **``qda_gem``**: Core GEM object management, including allocation, 
>>> mmap
>>> +      operations, and buffer lifecycle management.
>>> +    * **``qda_prime``**: PRIME import functionality for DMA-BUF 
>>> interoperability
>>> +      with other kernel subsystems.
>>> +
>>> +4.  **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg 
>>> framework
>>> +    to handle low-level message passing with the DSP firmware.
>>> +5.  **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to
>>> +    enumerate and manage the specific compute context banks defined in the
>>> +    device tree. The bus was introduced because IOMMU context banks (CBs) 
>>> are
>>> +    synthetic constructs — not real platform devices — making a platform 
>>> driver
>>> +    an incorrect abstraction for them. The earlier platform-driver 
>>> approach also
>>> +    had a race condition: device nodes were created before the RPMsg 
>>> channel
>>> +    resources were fully initialized, and because ``probe`` runs 
>>> asynchronously,
>>> +    applications could open a CB device and attempt to start a session 
>>> before
>>> +    the underlying transport was ready. The compute bus makes CB lifetime
>>> +    explicitly subordinate to the parent QDA device, closing that window.
>>> +6.  **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for
>>> +    marshalling arguments and handling remote invocations.
>>> +
>>> +User-Space API
>>> +==============
>>> +
>>> +The driver exposes a set of DRM-compliant IOCTLs:
>>> +
>>> +*   ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp")
>>> +    and capabilities.
>>> +*   ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process 
>>> context
>>> +    on the DSP.
>>> +*   ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the
>>> +    primary execution unit).
>>> +*   ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP 
>>> usage.
>>> +*   ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory 
>>> mapping.
>>> +*   ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or 
>>> unmap
>>> +    buffers into the DSP's virtual address space. Each accepts a 
>>> ``request``
>>> +    field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` 
>>> /
>>> +    ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation
>>> +    (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``).
>>
>> Explain, what happens in the users don't map the buffers into the DSP
>> space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What
>> is the difference between those two modes?
>>
>> Would the driver benefit from using GPUVM?
>>
>>> +
>>> +Usage Example
>>> +=============
>>> +
>>> +A typical lifecycle for a user-space application:
>>> +
>>> +1.  **Discovery**: Open ``/dev/accel/accel*`` and use
>>> +    ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that
>>> +    device node.
>>> +2.  **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to
>>> +    establish a session and create a process context on the DSP.
>>> +3.  **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import
>>> +    DMA-BUFs (PRIME fd) from other drivers using 
>>> ``DRM_IOCTL_PRIME_FD_TO_HANDLE``.
>>> +4.  **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments 
>>> and
>>> +    execute functions on the DSP.
>>> +5.  **Cleanup**: Close file descriptors to automatically release resources 
>>> and
>>> +    detach the session.
>>
>> I'd have expected the description of the actual example. I.e. clone the
>> app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo),
>> run make, run the app, check the results. I'd remind that DRM Accel has
>> a very specific requirement of having the working toolhain in the
>> open-source.
> 
> We have been getting submissions lately that don't fulfill that
> requirement so I will point to the precise part of the documentation
> that explains it:
> 
> https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html#open-source-userspace-requirements
> 
> For an example of a submissions that complies, see:
> 
> https://lore.kernel.org/dri-devel/[email protected]/
> 
> Most importantly, notice how the proposed Thames Mesa driver generates
> machine code for all the hardware units, and doesn't use any blob for
> that.
> 
I believe QDA checks all boxes for accel, as there is available
opensource userspace, opensource QAIC compiler for IDL compilation and
LLVM supports hexagon arch.

I'll try adding these details as well.

Thanks!> Regards,
> 
> Tomeu
> 
>>> +
>>> +Internal Implementation
>>> +=======================
>>> +
>>> +Memory Management
>>> +-----------------
>>> +The driver's memory manager creates virtual "IOMMU devices" that map to
>>> +hardware context banks. This allows the driver to manage multiple isolated
>>> +address spaces. The implementation uses a DMA-coherent backend to ensure 
>>> data consistency
>>> +between the CPU and DSP without manual cache maintenance in most cases.
>>
>> GEM usage?
>>
>>> +
>>> +Debugging
>>> +=========
>>> +The driver includes extensive dynamic debug support. Enable it via the
>>> +kernel's dynamic debug control:
>>> +
>>> +.. code-block:: bash
>>> +
>>> +    echo "file drivers/accel/qda/* +p" > 
>>> /sys/kernel/debug/dynamic_debug/control
>>> +
>>> +Testing
>>> +=======
>>> +The QDA driver can be exercised using the ``fastrpc_test`` utility from the
>>> +FastRPC userspace library. Run the test application:
>>
>> pointer
>>
>>> +
>>> +.. code-block:: bash
>>> +
>>> +    fastrpc_test -d 3 -U 1 -t linux -a v68
>>> +
>>> +**Options**
>>> +
>>> +``-d domain``
>>> +    Select the DSP domain to run on:
>>> +
>>> +    * ``0`` — ADSP
>>> +    * ``1`` — MDSP
>>> +    * ``2`` — SDSP
>>> +    * ``3`` — CDSP *(default on targets with CDSP)*
>>> +
>>> +``-U unsigned_PD``
>>> +    Select signed or unsigned protection domain:
>>> +
>>> +    * ``0`` — signed PD
>>> +    * ``1`` — unsigned PD *(default)*
>>> +
>>> +``-t target``
>>> +    Target platform: ``android`` or ``linux`` *(default: linux)*
>>> +
>>> +``-a arch_version``
>>> +    DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)*
>>>
>>> --
>>> 2.34.1
>>>
>>>
>>
>> --
>> With best wishes
>> Dmitry

Reply via email to