@joyliu37 thanks for looking into this. There is at the moment two VTA design 
sources: the initial design (which was used in the TVM and VTA papers) that was 
generated with HLS - this is the design that one can test and deploy on the 
Pynq/Ultra96 boards and run workloads like Resnet-18. We've also ran tuning on 
this design to obtain the close to "compute bound" performance on the device 
(as shown by the roofline plots in the TVM paper). The reason it's not 100% 
compute bound is because the GEMM and ALU share the same task-level pipeline 
stage.
The second design (which is specified in Chisel, and supports cycle accurate 
simulation) is a new addition and is under development/refinement. 

Finally on TSIM not modeling DRAM bandwidth: we will be bandwidth limited due 
to port width. It might not incorporate a latency model, but it should throttle 
DRAM access due to the memory interface width.





---
[Visit 
Topic](http://tracking.discuss.tvm.ai/tracking/click?d=RfbtgwlnIp1akh3X-9OeuBQiztdippIJB1vm252PZ9PfA1EnEZzUJSlR5qr8uEzxHLoxAFIEnNqsZvCqT-6cszCyoEs58j1gqGsdc0jGQX7KxWoaVP8SIuzZnzpn29Mm6aMc_qDLURmak6zrAq280TozP-dg0wLhcSx1vShucbLw2vInl00l6zqyqTP9o3vrPB_Tl_RugHVl7Y40XZR35DA1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](http://tracking.discuss.tvm.ai/tracking/click?d=7cFgOaAA4XIBVlVKt_oyC07uihTjg4Q6cjeBRNRTiPphU2ZYHhr_Zq3kbTZ8qNtShdscdkgubhz1jLM9SIDBbD4dkuB6s-hcrTJzBxQELjewNtob6tz0dacybDhEdb3iiVY0RPH_BPWSXgMCdHxqsD-9FFiIjeMmAkKnr4buKb_DOZGzAt8pE-6gr9JrOBuUucK5B-3k_r5jFDlFCq1YMAThY3uDj52QX3HR3WNoOPT-0).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=Z4h9GMBahpHr3SmjIhBErQ2

Reply via email to