Hello!

I am currently making a network using Winograd algorithm and problem is 
performance is less than direct conv2d.

According to the paper [Fast Algorithms for Convolutional Neural 
Networks](https://arxiv.org/pdf/1509.09308.pdf), performance should be higher 
than direct implementation as shown in Table 4 on page 8, but the results of my 
measurements show differently.

I tested performance with the code below.
This is the first conv layer of VGG16.

    data_shape = (1,3,224,224)
    w_shape = (64,3,3,3)

    sample_data = np.random.uniform(-1,1, size=data_shape ).astype("float32")
    sample_p1 = np.random.uniform(-1,1, size=w_shape ).astype("float32")

    input_data = tvm.te.placeholder( shape = data_shape, dtype = "float32", 
name="Input" )
    p1 = tvm.te.placeholder( shape = w_shape, dtype="float32", name="p1" )

    ## Winograd conv2d
    with tvm.target.create('cuda'):
        conv = topi.cuda.conv2d_nchw_winograd(input_data
                                              ,p1
                                              ,(1,1)
                                              ,(0,0)
                                              ,(1,1)
                                              ,"float32"  )
        sch = topi.cuda.schedule_conv2d_nchw_winograd([conv])
        with tvm.target.build_config(add_lower_pass=[ (0, DoNotIR) ], 
dump_pass_ir=False ) as cfg:
            winoMod = tvm.build( sch, [ input_data,p1] , target, name='wino')

    ## Direct conv2d
    with tvm.target.create('cuda'):
        conv = topi.cuda.conv2d_nchw( input_data
                                        ,p1
                                        ,[1,1]
                                        ,[0,0]
                                        ,[1,1] )
        sch = topi.cuda.schedule_conv2d_nchw([conv])
        with tvm.target.build_config(add_lower_pass=[ (0, DoNotIR) ], 
dump_pass_ir=False ) as cfg:
            simpleMod = tvm.build(sch, [input_data,p1], target, name='direct' )

    ## Allocate data to device
    tvm_input = tvm.nd.array( sample_data , ctx )
    tvm_p1 = tvm.nd.array( sample_p1, ctx )

    ## Performance Testing
    ev_wino = winoMod.time_evaluator(winoMod.entry_name, ctx, 
number=1,repeat=100 )
    ev_conv = simpleMod.time_evaluator(simpleMod.entry_name, ctx, 
number=1,repeat=100 )

    print("TESTING Result")
    timer = ev_conv( tvm_input, tvm_p1, tvm_o2).mean*1e3
    print("Conv with Direct algo -> ",timer)
    timer = ev_wino( tvm_input, tvm_p1, tvm_o1).mean*1e3
    print("Conv with Winograd Strassen algo -> ",timer )


And the performance result is as below.

    TESTING Result
    Conv with Direct algo ->  0.1153ms
    Conv with Winograd Strassen algo ->  4.8305ms

For 3by3 filters, the conv using the strassen algorithm should be faster, in 
which case the performance gap is severe.

Is there anything I'm misunderstanding?





---
[Visit 
Topic](https://discuss.tvm.ai/t/can-convolution-using-winograd-algorithm-be-slower-than-direct-convolution/6156/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/e878a729be640e903942b1e22c359821e5e43ce84661e64db6bc77924a683331).

Reply via email to