[quote="LeiWang1999, post:1, topic:18685"]
improve both readability and performance
[/quote]
I test cuda code like below and indeed get different inst sequence & register
use counts. It is a surprise since backend compiler do not optimize them to the
same binary codes :joy:.
```C++
__global__ void vecAdd(const float *A, const float *B, float *C, int n) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int i = tid; i < n; i += stride) {
C[i] = A[i] + B[i];
}
}
__global__ void vecAdd2(const float *A, const float *B, float *C, int n) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int j = 0; j < (n + stride - 1) / stride; ++j) {
int i = tid + j * stride;
C[i] = A[i] + B[i];
}
}
```
So it seems to be good to support steped loop node. Is there already any
(pre)rfcs about this thread? cc @LeiWang1999 @tqchen
---
[Visit
Topic](https://discuss.tvm.apache.org/t/do-we-have-plan-to-introduce-step-attribute-to-fornode/18685/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/2c7671fe986eb8150ac110fc1840320e58d8cd87e9358f2ecb516e51789cfbee).