Hi Jack, Jonathon, Mitch, Francois, Dan, and the ever-supportive CASPER 
community,

Thank you all for the very thoughtful and helpful input.  It’s all filed away 
for study, particularly a comparative look at alternative COTS solutions 
suggested.

To provide some bulleted responses to specific inputs:

We need 16,384 GS/s ADC conversation rate. RFSoc is very attractive in many 
ways, but so far doesn’t provide ADCs at near that rate. Yes, maybe interleave? 
  And that’s interesting, but we’re also a little allergic to interleave 
artifacts.
We’re not actually sure whether or not we need HBM, and understand its expense 
in $ and dissipation.  We may need fast RAM storage for various functions (FIR 
coefficient storage, transposes, packet buffering, ….) there are block RAM 
resources, which might be sufficient. Subject of current study. The retail 
price comparison of VU37P to VCU128 seemed more apples-to-apples.
That said 16,384 GS/s demands very high demux factors, we think factor of 64 is 
almost certainly required.  As shown in the SWARM paper  
<https://www.worldscientific.com/doi/epdf/10.1142/S2251171716410063>(section 4, 
equations 1, 2, and fig 5) demux factor D is the primary driver of PFB 
utilization, number of spectral points N has far less impact, at least as far 
as DSP slices is concerned.  (memory is driven more by N though that 
calculation isn’t quite as closed form)**
We do think of instruments, even real time ones, in terms of pure COTS high 
performance computing wherever possible, Alveo, GPUs, CPUs, whatever, 
packetized inputs and outputs.   However  go back to FPGAs as the only 
economical way (we think) to access the highest performance SERDES for ADC 
interfacing, and the precise timing of samples for VLBI.  The current 
instrument is a VLBI Digital Back End, not a correlator-beamformer, and it 
seems natural to bundle channelization on the FPGA as well, especially given 
that the DBE quantizes to 2-bits (typically) before spitting out its data 
product (thus very attractive to equalize across the wide band).  And the 
utilization considerations in the prior bullet drive us to big expensive FPGAs.


I’m on an airplane about to take off, heading to this (CASPER-driven)  EHT 
press event. 
<https://eventhorizontelescope.org/blog/event-horizon-telescope-collaboration-announce-groundbreaking-milky-way-results-may-12th>
  I’ll sign off now.

Best wishes, thanks again.

Jonathan and colleagues


**there is a useful memo for which section 4 in the SWARM paper is just a 
summary, this turns out not to be on the CASPER memo GitHub 
<https://github.com/casper-astro/publications/tree/master/Memos>    I’ll look 
into posting it—can I do that myself, or should I send it to an admin?



> On May 11, 2022, at 6:02 AM, Francois Kapp <[email protected]> wrote:
> 
> Hi Jonathan et al,
> 
> To close an open question: At SARAO we do not intend to pursue a SKARAB2, for 
> the same cost inconsistency you mentioned.  Instead, we are CASPER-ising 
> Xilinx Alveo boards, which are intended for production, albeit in a data 
> centre environment.  Our intention is to further develop hybrid FPGA/GPU 
> correlators around these boards.  At the moment, as one would expect, the GPU 
> development leads the FPGA development.
> 
> Others, notably CSIRO for the SKA Low design, are also proposing Alveo in a 
> very CASPER-like architecture: 
> https://www.spiedigitallibrary.org/journals/Journal-of-Astronomical-Telescopes-Instruments-and-Systems/volume-8/issue-01/011018/Square-Kilometre-Array-Low-Atomic-commercial-off-the-shelf-correlator/10.1117/1.JATIS.8.1.011018.full
>  
> <https://www.google.com/url?q=https://www.spiedigitallibrary.org/journals/Journal-of-Astronomical-Telescopes-Instruments-and-Systems/volume-8/issue-01/011018/Square-Kilometre-Array-Low-Atomic-commercial-off-the-shelf-correlator/10.1117/1.JATIS.8.1.011018.full&source=gmail-imap&ust=1652868194000000&usg=AOvVaw3CvEy02C-aqXjoXLaIgIT9>
>  - perhaps our CSIRO colleagues can chime in there, but packing 20 Alveo 
> U55c's in a server looks like something viable, and it certainly reduces the 
> overhead of the host server per FPGA.
> 
> +F.  
> 
> 
> 
> 
> 
> On Tue, 10 May 2022 at 00:35, Mitchell Burnett <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Jonathan,
> 
> To chime in under the “other” category…
> 
> We have recently added six RFSoC platforms to CASPER. (Three Xilinx eval 
> boards: ZCU111, ZCU216, ZCU208. The Xilinx education platform PYNQ RFSoC 2x2. 
> Two boards from HiTech Global: HTG-ZRF16-29DR, and the 49DR version.)
> 
> For ALPACA, we have used a couple ZCU111s, with the current plan to field 12 
> ZCU216’s in the final instrument. These are and will operate in a server 
> room, so again, nothing extreme. Beyond our ALPACA project, I am aware of 
> several folks that have all had success bringing up RFSoCs using Xilinx eval 
> boards with CASPER tools (and others that are not immediately using CASPER 
> tools, but are still using eval boards). So far, I have not experienced or 
> heard of performance issues or failures with the ZC111/216/208. But, I am 
> sure they exist and perhaps this brings those out. Because, with other 
> Ultrascale+ parts, I have heard of anecdotes similar to yours where a 
> significant qty. of eval boards were purchased for a wideband system with 
> ~20% failure rate.
> 
> Bringing up some boards with folks has been bumpy, but nothing attributed to 
> the board. Those cases have mostly been needing to work out the 
> documentation, and some strange outliers (like switching out an SD card from 
> the one provided with the board).
> 
> At least until now, I have had more issues with non-Xilinx RFSoC boards. But, 
> that speaks more in general to the relationship with a vendor and what they 
> support. Certainly, as pointed out, eval boards will not receive any 
> guarantee and in our case we have just decided to knowingly assume that risk.
> 
> In the end though, I just parrot much of what Jack said: I would try to avoid 
> eval boards, but using them is viable and in scenarios like mine, if the 
> project can take on and justify the risk then, OK. I believe SOMs are very 
> promising to look for first, with more vendors providing options. When 
> possible, choose vendors you have had a pretty good dialog about the 
> requirements with clear support expectations. Negotiating prices will be 
> tough (certainly with how supply is right now).
> 
> Don’t think I really added much to the conversation, just another data point 
> for you.
> 
> Best,
> 
> Mitch
> 
>> On May 9, 2022, at 12:56 PM, Jonathon Kocz <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Jonathan,
>> 
>> A couple of follow up questions (sorry for getting into nitty gritty you 
>> wanted to avoid!):
>> 
>> 1) Are you actually using the HBM? You can get much cheaper FPGAs with 
>> similar DSP/BRAM resources without HBM (if you are using HBM, are you doing 
>> this via CASPER?!)
>> 2) I've been using the VCU128 a bit - I'm working on a couple of projects 
>> with your ADC board now. I've not found any issues (yet!) with loading of 
>> code on power up, or with the 1Gb coming up - though I note that the 1Gb 
>> CASPER core for the VCU128 doesn't work properly (an init issue, it's on my 
>> list to fix that in the next couple of weeks). - Which set of libraries are 
>> you using, or are you working outside CASPER?
>> 3) On the CASPER conference/busy week front: With the 100Gb, is that also a 
>> CASPER core? We currently have at least two in the CASPER libraries, and 
>> part of the busy week I want to try to either integrate or find a use case 
>> where one might use one or the other to reduce confusion for users - if you 
>> have a third (and it's open source) it would be good to merge that in as 
>> well! 
>> 
>> In terms of eval boards in general:
>> 
>> I've fielded a few VCU128s and they're fine, but we're not running them in 
>> an extreme environment - just in labs / server rooms. I've previously had 
>> issues with other eval boards when trying to use them to maximum capacity - 
>> as Dan said, they're not really designed for it.
>> 
>> In terms of other boards - which should be merged into the main branch after 
>> the busy week:
>> 
>> We've put the HiTech Global HTG940 and HTG9200 boards into the CASPER 
>> library if either of those was useful. 
>> 
>> I would definitely recommend looking at Alpha Data as well - I've had good 
>> experiences with them so far, and I've recently put the ADA-SDEV-3 into 
>> CASPER [it has an FMC+ connector, but the FPGA might be too small for all 4 
>> ADCs depending on what you want to do with the input/channel resolution 
>> required].
>> 
>> Cheers,
>> Jonathon
>> 
>> On Mon, 9 May 2022 at 10:41, Jack Hickish <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Jonathan,
>> 
>> On Mon, 9 May 2022 at 17:26, 'Jonathan Weintroub' via 
>> [email protected] <mailto:[email protected]> 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi CASPERites,
>> 
>> At SAO we’ve been developing high performance instruments based on the 
>> Xilinx VCU128 evaluation board 
>> <https://www.google.com/url?q=https://www.xilinx.com/products/boards-and-kits/vcu128.html&source=gmail-imap&ust=1652868194000000&usg=AOvVaw3dcQ-dS2e8WgaX21eCfQxa>
>>  and the Adsantec ASNT7123A 
>> <https://www.google.com/url?q=https://www.adsantec.com/product/asnt7122-kma-2-2/&source=gmail-imap&ust=1652868194000000&usg=AOvVaw15_DY0SiM2LA_Emj2W-hxl>
>>  16 GS/s 4-bit direct flash ADC.   Currently VCU128 is currently priced at 
>> $10,794/each (it’s gone up a bit like everything but this is still good or 
>> even incredible value) and the lead time per the prior page is 2-weeks, 
>> essentially ex-stock.  Some time ago I had discussions with Xilinx and 
>> various distributors, and there was no obstruction to buying multiple piece 
>> quantities of the eval board.  Though its not the last word on acquisition 
>> I’ll note that the XCVU37P-L2FSVH2892E, a very large and high performance 
>> DSP oriented Ultrascale+ FPGA with 8GB of HBM and 9,024 DSP slices, is 
>> listed at $90,786 and no lead time quoted (though I hear 12 months is not 
>> uncommon for FPGAs these days) at Digikey 
>> <https://www.google.com/url?q=https://www.digikey.com/en/products/detail/xilinx-inc/XCVU37P-L2FSVH2892E/10445689?utm_adgroup%3DIntegrated%2520Circuits%26utm_source%3Dgoogle%26utm_medium%3Dcpc%26utm_campaign%3DDynamic%2520Search_EN_Product%26utm_term%3D%26utm_content%3DIntegrated%2520Circuits%26gclid%3DCjwKCAjw9-KTBhBcEiwAr19ig1z2w2nm58KUOcpYNY7njQ4XD2Ey-swFM0q8XFGSD_3hEU9LJdEwyxoCDEkQAvD_BwE&source=gmail-imap&ust=1652868194000000&usg=AOvVaw3Rxm_jX9kbzJq4ybhXpoh7>
>>  and other distributors. 
>> 
>> While our development with VCU128 has been very successful in terms of 
>> validating ADC performance, developing 100 Gbps Ethernet and various 
>> application firmware codes, we are hitting a variety of reliability and 
>> related issues with the VCU128.  There is a bit of a laundry list of these, 
>> following are examples: The 1 Gbps Ethernet control port doesn’t always come 
>> up reliably and loading of FPGA codes can be intermittent on power-up.  
>> Looking more at electromechanical, we have concerns that the FMC+ high speed 
>> connector (which we use to connect a “mezzanine” ADC board to the FPGA) is 
>> not mechanically robust, and has no built in positive locking or similar 
>> mechanism.  Also the eval board has quite of lot of circuitry we don’t need, 
>> notably the PCIe bus and connector, not in of itself a showstopper, however 
>> does interact with concerns we have about thermal design of the PCB and the 
>> impact on the overall system.
>> 
>> With all of the above as background, and noting that at one time the VCU128 
>> (and its predecessor the VCU118) has been suggested as a possibly viable 
>> CASPER-supported solution (I vaguely recall there was at one time a working 
>> group focused no eval boards generally), we are wondering what the broader 
>> experience of the collaboration has been?  Has any group been successful at 
>> fielding a viable production instrument based on VCU128 or other eval board? 
>>  Do the experiences reported in the second paragraph sound familiar to 
>> anyone?  
>> 
>> We are going down the path of custom hardware, but the addition of supply 
>> chain delays, and the impressive pricing noted above causes us to pause to 
>> poll for broader CASPER experiences.  Also relevant here would be to hear if 
>> there are any hardware developments (SKARAB2?) available to us and 
>> potentially helpful?
>> 
>> The VCU128 is certainly well-priced, but I would note that I have been using 
>> the same vu37p FPGA in PCIe form-factor via the Alpha Data ADM-PCIe-9H7 
>> board, which was (when I ordered 18 months ago) cheaper than the VCU128. I'm 
>> also quite enthusiastic about system-on-modules, which can deliver very good 
>> low-quantity prices without too much NRE to implement custom packaging. I've 
>> been playing with the iWave ZU11 SoM board with success so far.
>> 
>> Without knowing your requirements regarding NRE, lead-time, etc, my first 
>> suggestion is to get in touch with a Xilinx-partnered vendor and start a 
>> conversation with them. Maybe someone has what you need already, or maybe 
>> they would consider it a small job to adapt one of their existing boards to 
>> your needs. But what is almost certainly true is that the costs you will be 
>> quoted will be nothing like the digikey chip prices.
>> 
>> My personal opinion is that dev boards should be avoided in deployments 
>> (particularly in harsh environments) since Xilinx will no-doubt turn their 
>> back on you in the event you have issues. With a commercial vendor, you can 
>> at least lay out expectations for performance and tech support going into 
>> the project. I also find it hard to believe that it makes economic sense to 
>> design anything completely in-house and pay retail FPGA prices, unless you 
>> are big enough to have your own deal with Xilinx.
>> 
>> Just my subjective and biased $0.02 though :)
>> 
>> Jack
>>  
>> 
>> I am working with various colleagues on this, notably Ranjani Srinivasan and 
>> Rick Raffanti, and they may weigh in here to fill in details, or correct 
>> anything I got wrong.   But I hope the intent here is clear, not to get into 
>> the nitty gritty right away, but rather to discuss the bigger picture of 
>> viability of use of eval boards.
>> 
>> Thanks for reading and thinking about this, and best wishes,
>> 
>> Jonathan Weintroub
>> 
>> 

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/677A250B-6A6C-4221-B6BC-4F1EC8EAB6AF%40cfa.harvard.edu.

Reply via email to