Hello!
I am currently making a network using Winograd algorithm and problem is performance is less than direct conv2d. According to the paper [Fast Algorithms for Convolutional Neural Networks](https://arxiv.org/pdf/1509.09308.pdf), performance should be higher than direct implementation as shown in Table 4 on page 8, but the results of my measurements show differently. I tested performance with the code below. This is the first conv layer of VGG16. data_shape = (1,3,224,224) w_shape = (64,3,3,3) sample_data = np.random.uniform(-1,1, size=data_shape ).astype("float32") sample_p1 = np.random.uniform(-1,1, size=w_shape ).astype("float32") input_data = tvm.te.placeholder( shape = data_shape, dtype = "float32", name="Input" ) p1 = tvm.te.placeholder( shape = w_shape, dtype="float32", name="p1" ) ## Winograd conv2d with tvm.target.create('cuda'): conv = topi.cuda.conv2d_nchw_winograd(input_data ,p1 ,(1,1) ,(0,0) ,(1,1) ,"float32" ) sch = topi.cuda.schedule_conv2d_nchw_winograd([conv]) with tvm.target.build_config(add_lower_pass=[ (0, DoNotIR) ], dump_pass_ir=False ) as cfg: winoMod = tvm.build( sch, [ input_data,p1] , target, name='wino') ## Direct conv2d with tvm.target.create('cuda'): conv = topi.cuda.conv2d_nchw( input_data ,p1 ,[1,1] ,[0,0] ,[1,1] ) sch = topi.cuda.schedule_conv2d_nchw([conv]) with tvm.target.build_config(add_lower_pass=[ (0, DoNotIR) ], dump_pass_ir=False ) as cfg: simpleMod = tvm.build(sch, [input_data,p1], target, name='direct' ) ## Allocate data to device tvm_input = tvm.nd.array( sample_data , ctx ) tvm_p1 = tvm.nd.array( sample_p1, ctx ) ## Performance Testing ev_wino = winoMod.time_evaluator(winoMod.entry_name, ctx, number=1,repeat=100 ) ev_conv = simpleMod.time_evaluator(simpleMod.entry_name, ctx, number=1,repeat=100 ) print("TESTING Result") timer = ev_conv( tvm_input, tvm_p1, tvm_o2).mean*1e3 print("Conv with Direct algo -> ",timer) timer = ev_wino( tvm_input, tvm_p1, tvm_o1).mean*1e3 print("Conv with Winograd Strassen algo -> ",timer ) And the performance result is as below. TESTING Result Conv with Direct algo -> 0.1153ms Conv with Winograd Strassen algo -> 4.8305ms For 3by3 filters, the conv using the strassen algorithm should be faster, in which case the performance gap is severe. Is there anything I'm misunderstanding? --- [Visit Topic](https://discuss.tvm.ai/t/can-convolution-using-winograd-algorithm-be-slower-than-direct-convolution/6156/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/e878a729be640e903942b1e22c359821e5e43ce84661e64db6bc77924a683331).
