Re: MKLDNN performance in CI

Lv, Tao A Fri, 23 Nov 2018 04:04:29 -0800

It’s great to hear that works. Thanks for your effort on this, Marco. 👍


-Tao

Sent from my iPhone

> On Nov 23, 2018, at 7:31 PM, Marco de Abreu 
> <[email protected]> wrote:
> 
> Great news, it was the debug flag:
> 
> [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s
> [success] 6.64% test_random.test_shuffle: 128.5430s
> [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s
> [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s
> [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s
> [success] 3.91% test_operator.test_pick: 75.7241s
> [success] 3.34% test_operator.test_psroipooling: 64.7008s
> [success] 3.30% test_random.test_poisson_generator: 63.9218s
> [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s
> [success] 2.95% test_random.test_random: 57.0268s
> [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s
> [success] 2.03% test_random.test_normal_generator: 39.3641s
> [success] 1.86% test_io.test_Cifar10Rec: 36.0722s
> [success] 1.76% test_random.test_gamma_generator: 34.0995s
> [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s
> [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s
> 
> [...]
> 
> Ran 703 tests in 1941.053s
> 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0
> 
> We are all good and set then. Thanks Tao and Patric for your support!
> 
> Best regards,
> Marco
> 
> On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu <[email protected]>
> wrote:
> 
>> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379
>> 
>> -Marco
>> 
>> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric <[email protected]>
>> wrote:
>> 
>>> Thanks, it should be the most time-consuming parts.
>>> 
>>> @Marco, could you try to disable this env and see the performance again?
>>> 
>>>> -----Original Message-----
>>>> From: Lv, Tao A [mailto:[email protected]]
>>>> Sent: Friday, November 23, 2018 10:26 AM
>>>> To: [email protected]
>>>> Subject: RE: MKLDNN performance in CI
>>>> 
>>>> I think yes, except the cpp test.
>>>> 
>>>> -----Original Message-----
>>>> From: Zhao, Patric [mailto:[email protected]]
>>>> Sent: Friday, November 23, 2018 10:06 AM
>>>> To: [email protected]
>>>> Subject: RE: MKLDNN performance in CI
>>>> 
>>>> Good point, Tao!
>>>> Is this env enabled in all MKL-DNN CI?
>>>> 
>>>>> -----Original Message-----
>>>>> From: Lv, Tao A [mailto:[email protected]]
>>>>> Sent: Friday, November 23, 2018 9:53 AM
>>>>> To: [email protected]
>>>>> Subject: RE: MKLDNN performance in CI
>>>>> 
>>>>> Thanks for bringing this up, Marco. It's really weird since most of
>>>>> those tests listed in "worth noting" are not related to mkldnn
>>> backend.
>>>>> 
>>>>> I can understand that some tests for mkldnn operator may be slower
>>>>> because MXNET_MKLDNN_DEBUG is enabled in the CI:
>>>>> https://github.com/apache/incubator-
>>>>> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Marco de Abreu [mailto:[email protected]]
>>>>> Sent: Friday, November 23, 2018 9:22 AM
>>>>> To: [email protected]
>>>>> Subject: MKLDNN performance in CI
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have noticed that our Python tests have been increasing in duration
>>>> recently.
>>>>> In order to analyse this further, I created the PR [1] which allows to
>>>>> record test durations. Please note that I did not dive deep on these
>>>>> numbers and that they have to be taken with a grain of salt since
>>>>> slaves have varying resource utilizations.
>>>>> 
>>>>> Please have a look at the two following logs:
>>>>> Python3 CPU MKLDNN:
>>>>> http://jenkins.mxnet-ci.amazon-
>>>>> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>>>>> validation/pipelines/unix-cpu/branches/PR-
>>>>> 13377/runs/2/nodes/155/steps/409/log/?start=0
>>>>> Python3 CPU Openblas:
>>>>> http://jenkins.mxnet-ci.amazon-
>>>>> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>>>>> validation/pipelines/unix-cpu/branches/PR-
>>>>> 13377/runs/2/nodes/152/steps/398/log/?start=0
>>>>> 
>>>>> If you scroll to the end (note that there are multiple test stages and
>>>>> summaries being printed in these logs), you will find the following
>>>>> statements:
>>>>> 
>>>>> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
>>>>> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
>>>>> 
>>>>> This shows that the MKLDNN is generally being about 40% slower than
>>>>> the Openblas backend. If we go into the details, we can see that some
>>>>> tests are significantly slower:
>>>>> 
>>>>> Python3 CPU MKLDNN:
>>>>> 
>>>>>> [success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
>>>>>> test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
>>>>>> 10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
>>>>>> test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
>>>>>> test_operator.test_pick: 74.4041s [success] 2.39%
>>>>>> test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
>>>>>> test_random.test_negative_binomial_generator: 72.1751s [success]
>>>>>> 1.84%
>>>>>> test_operator.test_psroipooling: 55.9432s [success] 1.78%
>>>>>> test_random.test_poisson_generator: 54.0104s [success] 1.72%
>>>>>> test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
>>>>>> 1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
>>>>>> test_random.test_random: 42.8712s [success] 1.03%
>>>>>> test_operator.test_layer_norm: 31.1242s
>>>>> 
>>>>> 
>>>>> Python3 CPU Openblas:
>>>>>> [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
>>>>>> [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
>>>>>> test_random.test_negative_binomial_generator: 92.6899s [success]
>>>>>> 3.78%
>>>>>> test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
>>>>>> 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
>>>>>> test_random.test_poisson_generator: 68.7500s  [success] 3.10%
>>>>>> test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
>>>>>> test_operator.test_layer_norm: 59.9566s  [success] 2.66%
>>>>>> test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
>>>>>> 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
>>>>>> test_random.test_random: 55.8920s  [success] 2.19%
>>>>>> test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
>>>>>> test_contrib_control_flow.test_cond: 20.6908s
>>>>> 
>>>>> Tests worth noting:
>>>>> - test_random.test_shuffle: 700% increase - but I don't know how this
>>>>> may be related to MKLDNN. Are we doing random number generation in
>>>>> either of those backends?
>>>>> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
>>>>> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and
>>>>> to be expect :)
>>>>> - test_operator.test_broadcast_binary_op: 80% increase
>>>>> - test_contrib_control_flow.test_cond: 250% increase
>>>>> - test_operator.test_layer_norm: 50% decrease - nice!
>>>>> 
>>>>> As I have stated previously, these numbers might not mean anything
>>>>> since the CI is not a benchmarking environment (sorry if these are
>>>>> false negatives), but I thought it might be worth mentioning so Intel
>>>>> could follow up and dive deeper.
>>>>> 
>>>>> Does anybody here create 1:1 operator comparisons (e.g. running
>>>>> layer_norm in the different backends to compare the performance) who
>>>>> could provide us with those numbers?
>>>>> 
>>>>> Best regards,
>>>>> Marco
>>>>> 
>>>>> [1]: https://github.com/apache/incubator-mxnet/pull/13377
>>> 
>>

Re: MKLDNN performance in CI

Reply via email to