On Thu, Jun 04, 2026 at 10:39:14AM +0530, Ekansh Gupta wrote:
> On 20-05-2026 19:26, Dmitry Baryshkov wrote:
> > On Tue, May 19, 2026 at 11:46:02AM +0530, Ekansh Gupta via B4 Relay wrote:
> >> From: Ekansh Gupta <[email protected]>
> >>
> >> Implement the FastRPC remote procedure call path, allowing user-space
> >> to invoke methods on the DSP via DRM_IOCTL_QDA_REMOTE_INVOKE.
> >>
> >> qda_fastrpc.c / qda_fastrpc.h
> >> Implements the FastRPC protocol layer: argument marshalling
> >> (qda_fastrpc_invoke_pack), response unmarshalling
> >> (qda_fastrpc_invoke_unpack), and invocation context lifecycle
> >> management. Each invocation allocates a fastrpc_invoke_context
> >> which tracks buffer descriptors, GEM objects, and the completion
> >> used to synchronise with the DSP response.
> >>
> >> Buffer arguments are handled in three ways:
> >> - DMA-BUF fd: imported via PRIME, IOMMU-mapped dma_addr used
> >> - Direct (inline): copied into the GEM-backed message buffer
> >> - DMA handle: fd forwarded to DSP, physical page descriptor computed
> >
> > No. This needs to go away. The QDA should support only one way to pass
> > data - via the GEM buffers. Everything else should be handled by the
> > shim layer, etc.
> each FD passed here is a GEM buffer. The reason to pass fd is that there
> are some APIs on DSP side which takes fd as an argument and the user
> might use the same on their skel implementation. So in this case the
> remote call will take fd to DSP and the skel implementation will use the
> FD.>
Then handle it all on the userspace side. In the end, bad library API is
not a reason to complicate kernel API and kernel driver.
> >> +#define FASTRPC_SCALARS(method, in, out) \
> >> + FASTRPC_BUILD_SCALARS(0, method, in, out, 0, 0)
> >> +
> >> +/**
> >> + * struct fastrpc_buf_overlap - Buffer overlap tracking structure
> >> + *
> >> + * Tracks overlapping buffer regions to optimise memory mapping and avoid
> >> + * redundant mappings of the same physical memory.
> >
> > WHat for? Even if this is a valid optimization, implement it as a
> > subsequent patch. The first goal should be very simple - get GEM buffers
> > from the app, pass them to the DSP, read the results.
> yes, this implementation is mimicking the existing fastrpc design where
> non-FD buffers are also supported. I am currently evaluating the
> maintainance of such buffers from userspace side and trying to
> understand the impacts of the same. I am planning to bring it as a
> future enhancement if there is no regression.>
Other way around. Drop it for now and bring it back if it has any
positive impact.
> >> + */
> >> +struct fastrpc_buf_overlap {
> >
> > Stop clashing the names with the existing fastrpc driver.
> ack.>
> >> + /** @start: Start address of the buffer in user virtual address space */
> >> + u64 start;
> >> + /** @end: End address of the buffer in user virtual address space */
> >> + u64 end;
> >> + /** @raix: Remote argument index associated with this overlap */
> >> + int raix;
> >> + /** @mstart: Start address of the mapped region */
> >> + u64 mstart;
> >> + /** @mend: End address of the mapped region */
> >> + u64 mend;
> >> + /** @offset: Offset within the mapped region */
> >> + u64 offset;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_remote_dmahandle - Remote DMA handle descriptor
> >> + */
> >> +struct fastrpc_remote_dmahandle {
> >> + /** @fd: DMA-BUF file descriptor */
> >> + s32 fd;
> >> + /** @offset: Byte offset within the DMA-BUF */
> >> + u32 offset;
> >> + /** @len: Length of the region in bytes */
> >> + u32 len;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_remote_buf - Remote buffer descriptor
> >> + */
> >> +struct fastrpc_remote_buf {
> >> + /** @pv: Buffer pointer (user virtual address) */
> >> + u64 pv;
> >> + /** @len: Length of the buffer in bytes */
> >> + u64 len;
> >> +};
> >> +
> >> +/**
> >> + * union fastrpc_remote_arg - Remote argument (buffer or DMA handle)
> >> + */
> >> +union fastrpc_remote_arg {
> >> + /** @buf: Inline buffer descriptor */
> >> + struct fastrpc_remote_buf buf;
> >> + /** @dma: DMA-BUF handle descriptor */
> >> + struct fastrpc_remote_dmahandle dma;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_phy_page - Physical page descriptor
> >> + */
> >> +struct fastrpc_phy_page {
> >> + /** @addr: Physical (IOMMU) address of the page */
> >> + u64 addr;
> >> + /** @size: Size of the contiguous region in bytes */
> >> + u64 size;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_invoke_buf - Invoke buffer descriptor
> >> + */
> >> +struct fastrpc_invoke_buf {
> >> + /** @num: Number of contiguous physical regions */
> >> + u32 num;
> >> + /** @pgidx: Index into the physical page array */
> >> + u32 pgidx;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_msg - FastRPC wire message for remote invocations
> >> + *
> >> + * Sent to the remote processor via RPMsg. This is the exact layout
> >> + * the DSP expects; do not reorder or add fields without DSP firmware
> >> + * coordination.
> >> + */
> >> +struct fastrpc_msg {
> >> + /** @remote_session_id: Session identifier on the remote processor */
> >> + int remote_session_id;
> >> + /** @tid: Thread ID of the invoking thread */
> >> + int tid;
> >> + /** @ctx: Context identifier for matching request/response */
> >> + u64 ctx;
> >> + /** @handle: Handle of the remote method to invoke */
> >> + u32 handle;
> >> + /** @sc: Scalars value encoding in/out buffer counts */
> >> + u32 sc;
> >> + /** @addr: Physical address of the message payload buffer */
> >> + u64 addr;
> >> + /** @size: Size of the message payload in bytes */
> >> + u64 size;
> >> +};
> >> +
> >> +/**
> >> + * struct qda_msg - FastRPC message with kernel-internal bookkeeping
> >> + *
> >> + * The wire-format portion is kept in the embedded @fastrpc member (must
> >> + * be first) so that &qda_msg->fastrpc can be passed directly to
> >> + * rpmsg_send() without a copy.
> >> + */
> >> +struct qda_msg {
> >> + /**
> >> + * @fastrpc: Wire-format message sent to the DSP via RPMsg.
> >> + * Must be the first member.
> >> + */
> >> + struct fastrpc_msg fastrpc;
> >> + /** @buf: Kernel virtual address of the payload buffer */
> >> + void *buf;
> >> + /** @phys: Physical/DMA address of the payload buffer */
> >> + u64 phys;
> >> + /** @ret: Return value from the remote processor */
> >> + int ret;
> >> + /** @fastrpc_ctx: Back-pointer to the owning invocation context */
> >> + struct fastrpc_invoke_context *fastrpc_ctx;
> >> + /** @file_priv: DRM file private data for GEM object lookup */
> >> + struct drm_file *file_priv;
> >> +};
> >> +
> >> +/**
> >> + * struct fastrpc_invoke_context - Remote procedure call invocation
> >> context
> >> + *
> >> + * Maintains all state for a single remote procedure call, including
> >> buffer
> >> + * management, synchronisation, and result handling.
> >> + */
> >> +struct fastrpc_invoke_context {
> >> + /** @node: List node for linking contexts in a queue */
> >> + struct list_head node;
> >> + /** @ctxid: Unique context identifier (XArray key shifted left by 4) */
> >> + u64 ctxid;
> >> + /** @inbufs: Number of input buffers */
> >> + int inbufs;
> >> + /** @outbufs: Number of output buffers */
> >> + int outbufs;
> >> + /** @handles: Number of DMA-BUF handle arguments */
> >> + int handles;
> >> + /** @nscalars: Total number of scalar arguments */
> >> + int nscalars;
> >> + /** @nbufs: Total number of buffer arguments (inbufs + outbufs) */
> >> + int nbufs;
> >
> > If it is inbufs + outbufs, why do you need it here?
> >
> >> + /** @pid: Process ID of the calling process */
> >> + int pid;
> >> + /** @retval: Return value from the remote invocation */
> >> + int retval;
> >> + /** @metalen: Length of the FastRPC metadata header in bytes */
> >> + int metalen;
> >
> > size_t, also why do you need it?
> >
> >> + /** @remote_session_id: Session identifier on the remote processor */
> >> + int remote_session_id;
> >> + /** @pd: Protection domain identifier encoded into the context ID */
> >> + int pd;
> >> + /** @type: Invocation type (e.g. FASTRPC_RMID_INVOKE_DYNAMIC) */
> >> + int type;
> >> + /** @sc: Scalars value encoding in/out buffer counts */
> >> + u32 sc;
> >
> > How is this different from the counts above?
> sc carries the method id and handle counts. The reason to maintain count
> separately is to avoid calculating it again and again.>
Is it just a sum of several values or something more complicated?
> >> + /** @handle: Handle of the remote method being invoked */
> >> + u32 handle;
> >> + /** @crc: Pointer to CRC values for data integrity checking */
> >> + u32 *crc;
> >
> > Add it later. It's unused. Drop all unused fields.
> ack.>
> >> + /** @fdlist: Pointer to array of DMA-BUF file descriptors */
> >> + u64 *fdlist;
> >
> > Why do you need DMA-BUFs in the invocation context? They all should be
> > GEM buffers.
> the reason is that the users are dependent on FDs as they can import
> buffers allocated from anywhere and there are DSP APIs which takes fd as
> an argument, so they might end up using the same in there skel
> implementation.>
No, DSP API can't take FD, they don't quite cross the OS and IOMMU
boundary. It's the userspace library API. Which might be improved,
rewritten, implemented underneath, etc. For the kernel side please,
pass _only_ GEM handles + offsets.
> >> + /** @pkt_size: Total payload size in bytes */
> >> + u64 pkt_size;
> >> + /** @aligned_pkt_size: Page-aligned payload size for GEM allocation */
> >> + u64 aligned_pkt_size;
> >> + /** @list: Array of invoke buffer descriptors */
> >> + struct fastrpc_invoke_buf *list;
> >> + /** @pages: Array of physical page descriptors for all arguments */
> >> + struct fastrpc_phy_page *pages;
> >> + /** @input_pages: Array of physical page descriptors for input buffers
> >> */
> >> + struct fastrpc_phy_page *input_pages;
> >
> > I think you are trying to bring all the complexity from the old driver
> > with no added benefit. Please don't. Use the existing memory manager.
> > Let it handle all the gory details. If someting is not there, we should
> > consider extending GEM instead.
> I'm not changing the metadata format as the DSP might not understand the
> messages if we modify it.
Well, it's up to you to know if DSP will understand the message or not.
The probability ("might not") is not suitable here. Anyway, let's get
rid of the various data formats first, then maybe some of the items will
go away on their own.
> Also, the fd is still being used because of
> the client dependency on it. I'll check if there is any other logic that
> needs alteration here.>
If the client keeps on passing FD to the library calls, you can map
FD to GEM handles in the library code.
> >> +
> >> +static int fastrpc_context_get_id(struct fastrpc_invoke_context *ctx,
> >> struct qda_dev *qdev)
> >> +{
> >> + int ret;
> >> + u32 id;
> >> +
> >> + if (!qdev)
> >> + return -EINVAL;
> >> +
> >> + ret = xa_alloc(&qdev->ctx_xa, &id, ctx, xa_limit_32b, GFP_KERNEL);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + ctx->ctxid = id << 4;
> >
> > Why is it being shifted?
> this is to accomodate PD type>
Not really an answer.
> >> + return 0;
> >> +}
> >> +
> >
>
--
With best wishes
Dmitry