If you want one or two specific configurations to work with, they would be:
- batch size = 12 (but `batch_matmul_schedule` didn't require a constant batch 
size, so maybe this doesn't need to be constant)
- embedding size: 768
- sequence length: 4,096
- window size: 512
- dilation: 0 and 3  (I think a lot of the locality assumptions for caching 
will break once we start working with non-zero dilation. That's why we need to 
study both cases, 0 because it is the most common, and 3 because it is 
representative of the cases where locality breaks)





---
[Visit 
Topic](https://discuss.tvm.ai/t/developing-a-faster-schedule-for-longformers-kernel/6367/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/d5844d68115645773296e165be301506bcda1d2e6a543a5c11898b7b19dee893).

Reply via email to