Sorry for the complicated code. I didn't know how to compile multiple kernels into one `.so` file so ended up cramming three functions (one for the forward pass and 2 for the backward) into one with flags to switch between them. Here are the constants for the forward pass ``` b = 1 # batch size n = 4096 # sequence length h = 12 # number of heads (this dimension can be merged with the batch size if needed) m = 768 # hidden dimension -> 768 w = 256 # window size on one side w_upper = 256 # window size to the right of the word. Should be `w` for the non-autoregressive case padding = 0 # padding -> any const transpose_t1 = 0 # `0` for one of the backward functions and `1` for the other, doesn't matter for the forward t1d3 = 768 # last dimension of t1 -> this is `m` for the forward function and `2w+1` (number of diagonals) for the backward t3d3 = 513 # last dimensions of t3, this is 2w+1 for the forward pass ```
--- [Visit Topic](https://discuss.tvm.ai/t/developing-a-faster-schedule-for-longformers-kernel/6367/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/e1355423711038c18285628d43310b696a4fd0a407d09d2255da63dd68ebd2b6).