Thanks!
I'm not familiar with this project BitBlas. Please correct me if I am wrong: in the code you showed, the IRModule pass that retrieves the threadblock dimensions is [get_annotated_device_mod](https://github.com/microsoft/BitBLAS/blob/2f6d316be9f9d70f2845c2f319ac2f348d0cd6a6/bitblas/utils/rtmod_analysis.py#L11) I'm confused by how the cuda source wrapper is initialized; an IR module plus a source string is passed? don't you typically get the source after building the module? Also, do you initialize the TileDevice class with ```remote.cl()``` or ```remote.cuda()``` just as tvm examples do? Here's a python script that prints the source for a single conv2d (I omitted tuning for brevity). I still don't know how to get work group sizing though. Do you have any advice on how to use your method in BitBlas here? ``` import numpy as np import tvm from tvm import relay, autotvm import tvm.relay.testing target_str = "opencl" target = tvm.target.Target(target_str, host="llvm -mtriple=aarch64-linux-android") dtype = "float16" input_name = "input" filter_name = "weight" input_shape=(1, 25, 25, 64) filter_shape=(3, 3, 64, 96) filter = np.random.rand(*filter_shape).astype(dtype) input = tvm.relay.var("input", shape=input_shape, dtype=dtype) weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype) D = relay.nn.conv2d(input, weight, padding=(0, 0), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype) mod = relay.Function([input, weight], D) params = { "weight": tvm.nd.array(filter) } with tvm.transform.PassContext(opt_level=3): graph, lib, params = relay.build_module.build(mod, target, params=params) print(lib.imported_modules[0].get_source()) ``` --- [Visit Topic](https://discuss.tvm.apache.org/t/phasing-out-legacy-components/17703/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/61baca3e5b995cebfbedab3eb3430330770a99a52af074b591405f5342d4c46f).