Welcome to the TVM community :)
Mali doesn't really have an equivalent to Nvidia's shared memory, it uses the system RAM backed by an unconfigurable cache. Local is just OpenCL's term for CUDA's shared. This means that using explicit cache read/writes to shared/local aren't advised when optimising for Mali. As to explicitly generating vectorize instructions, that will depend on the architecture in question. Post-Midgard GPUs should not require it (other than perhaps vectorizing load/stores). --- [Visit Topic](https://discuss.tvm.ai/t/rfc-ansor-an-auto-scheduler-for-tvm-autotvm-v2-0/7005/28) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/b0a63771af3784f725c0c3e1b4a449f8aad246d373bd58610906df5ed54c795c).