Yeah depends on what you want to do with the data. If the ARM is further processing the data, than DMA usually makes sense, because the ARM can access it’s memory more quickly and use the L1/L2 Cache for data in the PS memory, plus avoiding the ARM spending process clock cycles reading data from the PL and copying it into the PS memory (which is likely what will happen in some processing, say if you were executing an FFT in software or trying to generate an ethernet packet in software to send the data to a user display)
One copy isn’t too bad, but it can get ridiculous if you make more than one copy. But there’s also how the copy is done… What you almost never want to happen is a memcopy done via software for loop or similar, because if you aren’t extremely careful coding that up in software it won’t even use the ARM DMA blocks buried in the processor to do it. So if your software application end up reading from the PL memory and writing to another block of memory (eg a software defined array variable) (in the PS) that’s probably a sign that DMA is the right thing to do, so the ARM isn’t spending processor cycles just copying data. But whenever you do something with DMA… Your complexity just shot through the roof, even if it can technically net performance gains. But there are certainly cases were there is no appreciable performance gain, and the complexity is not warranted. Eg if you aren’t heavily using the ARM processor for processing and just desperately want to read some I/Q data into a file on disk. You might not care about the fact that the processor is spinning reading data in that case. Especially since using DMA effectively in Linux for “direct storage” to a nvme drive or SD card is well that’s really only something the industry is just starting to do on X86 machines (eg Direct Storage with GPUs), so you probably are going to be forced to let software handle that anyway on ARM rather than writing a very complicated driver-level code to do it…. Gah complicated software stuff galore in that discussion… Because properly handling data movement in embedded (software) systems for highest performance is well complicated…. And I probably shouldn’t talk much about it… I’m an FPGA engineer not a Kernel-level Embedded software engineer.. [AB72FAB9] Matthew Schiller ngVLA Digital Backend Lead NRAO [email protected]<mailto:[email protected]> 315-316-2032 Matthew Schiller From: 'Ross Martin' via [email protected] <[email protected]> Sent: Thursday, October 5, 2023 10:46 AM To: casper <[email protected]> Subject: Re: [casper] PL data to PS DDR4 (AXI) {External} DMA isn't always the best answer. It's sometimes best to just leave the data in the PL and have the processor access it directly. If the processor reads the data directly, it's just accessed once, and only the data you need is accessed. If you transfer via DMA, it's read once by the DMA from the PL, written to PS memory once by the DMA, and then read again by the processor to do it's processing. Also, the DMA must potentially transfer more data than the current processing actually needs, since it may need to account for contingencies. So although DMA access *might* be faster access, it's definitely accessing the data more times. It won't always be worth it. DMA also adds an additional layer of software complexity. As an example of the non-DMA solution, the demo I released for the RFSoC4x2 pulls the data directly from the PL into the ARM without doing any DMA. Regards, Ross On Thu, Oct 5, 2023, 3:06 PM Jack Hickish <[email protected]<mailto:[email protected]>> wrote: This seems like a fun "discuss at the workshop" topic! I have a couple of applications where I think this functionality would be useful, so I'd definitely be interested in helping out. >From a toolflow side I think getting the automated instantiation of the DMA IP >should be relatively straightforward. Handling what the CPU does to interact >with the core, and/or how you might interact with the core remotely over a >network I'm less sure about. Cheers Jack On Thu, 5 Oct 2023 at 12:08, Matthew Schiller <[email protected]<mailto:[email protected]>> wrote: The right way to do what you describe is with the axi DMA block, but as you point out that has a software interface to configure the transfer. The main data would flow over an AXI4 “full” interface that supports burst transactions (but the Xilinx-provided DMA block already does that), and the configuration of the DMA block comes from software over AXI4 lite. There are two approaches (which should be supported by either using the correct DMA block or the correct settings on the DMA block). A Standard DMA block can be used if fixed addresses in memory can be allocated. This would mean that the linux kernel is told to only use ½ of the PS memory for example. Software can still access the upper half though for example /dev/mem reads, but the upper memory disappears from linux for normal applications.. Alternatively, though more complicated, a “scatter-gather” DMA is implemented. A Scatter Gather DMA uses a software driver/server that will “malloc” memory in a normal software way, and then provide pointers to the Scatter Gather DMA to that memory. Because of the way virtual memory works, this is not as trivial as it sounds and is requires several steps to accomplish as the FPGA needs the physical, not virtual address, and must respect the fact that memory is allocated in virtual memory on “pages” and not necessarily contiguously. sgDMA is better in many systems though because linux can still access all the memory so if you aren’t recording data, for example, more complicated software applications can run. I don’t believe this has been done yet in casper, but it is possible since these are standard Xilinx provided blocks. We just need to get the block instantiated properly in sysgen to accept an AXI streaming data stream from your DSP algorithm or the ADCs. and then on the ARM processor we need appropriate software/drivers to allocate memory and configure the DMA. I think I heard a rumor that it was planned, but hasn’t been tackled yet. With AXIDMA, you can probably get to around 20Gbit/sec (in theory probably as high as 40 depending on what speed the DDR4 train to) or better transfer performance to the PL. Not that the little arm on these FPGAs can do much with that speed of data, but for recording a snippit of data or something like that that can allow some fairly significant sample rates of I/Q data for example. (at 8-bit I/Q that’s >1GSPS!). If instead you did the register approach you mentioned I would expect rates around 100Mbit/sec to be possible, and to achieve that the processor in the ARM will be going nuts, because AXI4-Lite tends to require the processor to spin (DMA frees the processor for other stuff, while polling registers takes time to accomplish) FWIW: ngVLA plans to create functionality like this in “pure” hdl and given the current effort to use more VHDL/Verilog blocks in casper ngVLA’s work may be useful in the future. I hope to make progress on ngVLA’s approach later this calendar year. But ngVLA is on Intel FPGAs so a porting process would still be required to get that into Casper. From: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> On Behalf Of Ken Semanov Sent: Thursday, October 5, 2023 4:11 AM To: [email protected]<mailto:[email protected]> Subject: [casper] PL data to PS DDR4 (AXI) {External} Is there an obvious way to migrate data from the PL into memory that is mapped into the address space of the PS? Ideally I would use axi_interconnect as shown https://casper-toolflow.readthedocs.io/en/latest/axi4lite_documentation.html A possible approach is to instantiate axi_dma within the PL , and the PL acts as the master during transfers. But the axi_dma exposes a AXI4-Lite slave port to the PS so that the PS configures and starts the transfers. The receiving raw device would be the memory controller of the PS DDR4. (Presumably the data is accessed later by software via the DMA engine). Another approach would be to expose a single register, and perform this slowly word-by-word (without streaming or bursting.) Is this plausible in CASPER, or are steep changes required? -- You received this message because you are subscribed to the Google Groups "[email protected]<mailto:[email protected]>" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "[email protected]<mailto:[email protected]>" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "[email protected]<mailto:[email protected]>" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "[email protected]<mailto:[email protected]>" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB3523893523EA0FBFBBE04949ABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com.

