Hello!
Currently I am trying to inference VGG-16 through arm cpu. import tvm import tvm.relay as relay from tvm.contrib import graph_runtime import numpy as np import topi from tvm.relay.testing.temp_op_attr import TempOpAttr target_arm_cpu = tvm.target.create('llvm -device=arm_cpu -target=aarch64-linux-gnu') ctx_arm_cpu = tvm.runtime.cpu() dtype='float32' batch_size = 1 num_class = 1000 image_shape = (3, 224, 224) data_shape = (batch_size,) + image_shape out_shape = (batch_size, num_class) mod, paramsO = relay.testing.vgg.get_workload( num_layers=16, batch_size=batch_size, image_shape=image_shape) opt_level = 3 #arm_cpu with relay.build_config(opt_level = opt_level): graph, lib, params = relay.build_module.build( mod, target_arm_cpu , params = paramsO ) data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape ).astype("float32") , ctx_arm_cpu ) module = graph_runtime.create(graph, lib, ctx_arm_cpu) module.set_input("data", data) module.set_input(**params) print("RUNNING") timer = module.module.time_evaluator('run',ctx_arm_cpu,number=1,repeat=2) prof_res = np.array( timer().results )*1000 print("arm CPU -> Mean inference time (std dev): %.2f ms (%.2f ms)" %(np.mean(prof_res), np.std(prof_res))) When I run the above code, the result is `arm CPU -> Mean inference time (std dev): 1954.49 ms (0.57 ms)` I remembered that in the old version of tvm, when vgg16 was inference, the performance was measured at about 1000ms. The performance seems to have decreased by about twice. Is there anything I misunderstood? Or am I implementing the code incorrectly? --- [Visit Topic](https://discuss.tvm.ai/t/arm-cpu-performance-is-too-slow-than-mali-gpu/6220/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/8aa84c47c99eaa08c35164e82dcd8dd218241625b7287a67aee540c70a85ae72).