Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Anton Chernov Wed, 07 Nov 2018 01:50:48 -0800

Hi Sheng,

thanks for you suggestions. Personally, I would not rush with new major
release as this breaks the pace and creates unnecessary pressure in my
opinion.


If the changes suggested by Haibin are really important then I think we can
consider them for the minor release, even if they are not strictly speaking
*bugfixes*. Do you think that might be an option?

And did I understand correctly, you are suggesting:

[MXNET-1179] Enforce deterministic algorithms in convolution layers
https://github.com/apache/incubator-mxnet/pull/12992

for the 1.3.1 release?

Best
Anton


ср, 7 нояб. 2018 г. в 0:59, Sheng Zha <[email protected]>:

> Similar to the two PRs that Haibin suggested, 12992 introduces new
> interface for controlling determinism, which is better suited for minor
> release.
>
> I think other than lack of release manager to drive 1.4.0 release, there’s
> no reason we cannot do two releases (1.4.0 & 1.3.1) at the same time. I’m
> willing to help with the 1.4.0 release to make these new features available
> one month sooner, if there’s no other concern.
>
> -sz
>
> > On Nov 6, 2018, at 3:30 PM, Lin Yuan <[email protected]> wrote:
> >
> > Hi Anton,
> >
> > Thanks for helping the release.
> > The following PRs are needed by customers who want to use deterministic
> > CUDNN convolution algorithms:
> >
> > https://github.com/apache/incubator-mxnet/pull/12992
> > https://github.com/apache/incubator-mxnet/pull/13049
> >
> > Thanks！
> >
> > Lin
> >
> >
> > On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <[email protected]>
> > wrote:
> >
> >> Hi Anton,
> >> I have the following suggestions for fixes to include in 1.3.1. These
> each
> >> have updates to files that will impact docs generation for the 1.3.x
> >> version of the website's Python API docs:
> >>
> >> https://github.com/apache/incubator-mxnet/pull/12879
> >> https://github.com/apache/incubator-mxnet/pull/12871
> >> https://github.com/apache/incubator-mxnet/pull/12856
> >>
> >> Thanks,
> >> Aaron
> >>
> >>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <[email protected]> wrote:
> >>>
> >>> Hi Anton,
> >>>
> >>> Thanks for driving this, I would like to include the following fix in
> >>> 1.3.1:
> >>> Allow infer shape partial on foreach operator:
> >>> https://github.com/apache/incubator-mxnet/pull/12471
> >>>
> >>> Keras-MXNet needs this functionality to infer shape partially
> >>> on foreach operator. (Used in RNN operators)
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> Best Regards
> >>> Lai Wei
> >>>
> >>>
> >>>
> >>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi Naveen and Anton,
> >>>>
> >>>> Thanks for pointing that out. You are right that these are not
> critical
> >>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> >>>>
> >>>> Best,
> >>>> Haibin
> >>>>
> >>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <[email protected]>
> >> wrote:
> >>>>
> >>>>> Please note that this is a patch release(1.3.1) to address critical
> >>>> bugs!,
> >>>>> For everything else please wait for 1.4.0 which is planned very
> >> shortly
> >>>>> after 1.3.1
> >>>>>
> >>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov <[email protected]>
> >>> wrote:
> >>>>>>
> >>>>>> The following PR's have been created so far:
> >>>>>>
> >>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13117
> >>>>>>
> >>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13118
> >>>>>>
> >>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13119
> >>>>>>
> >>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13120
> >>>>>>
> >>>>>> Set correct update on kvstore flag in dist_device_sync mode
> >> (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13121
> >>>>>>
> >>>>>> update mshadow (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13122
> >>>>>>
> >>>>>> CudnnFind() usage improvements (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13123
> >>>>>>
> >>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
> >>>>> (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13124
> >>>>>>
> >>>>>>
> >>>>>> As stated previously I would be rather opposed to have following
> >> PR's
> >>>> it
> >>>>> in
> >>>>>> the patch release:
> >>>>>>
> >>>>>> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13129
> >>>>>>
> >>>>>> sample_like operators (#13034) v1.3.x
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13130
> >>>>>>
> >>>>>>
> >>>>>> Best
> >>>>>> Anton
> >>>>>>
> >>>>>> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <[email protected]>:
> >>>>>>
> >>>>>>> Hi Haibin,
> >>>>>>>
> >>>>>>> I have a few comments regarding the proposed performance
> >> improvement
> >>>>>>> changes.
> >>>>>>>
> >>>>>>> CUDNN support for LSTM with projection & clipping
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13056
> >>>>>>>
> >>>>>>> There is no doubt that this change brings value, but I don't see
> >> it
> >>>> as a
> >>>>>>> critical bug fix. I would rather leave it for the next major
> >>> release.
> >>>>>>>
> >>>>>>> sample_like operators
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13034
> >>>>>>>
> >>>>>>> Even if it's related to performance, this is an addition of
> >>>>> functionality
> >>>>>>> and I would also push this to be in the next major release only.
> >>>>>>>
> >>>>>>>
> >>>>>>> Best
> >>>>>>> Anton
> >>>>>>>
> >>>>>>>
> >>>>>>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <[email protected]>:
> >>>>>>>
> >>>>>>>> Hi Patric,
> >>>>>>>>
> >>>>>>>> This change was listed in the 'PR candidates suggested for
> >>>>> consideration
> >>>>>>>> for v1.3.1 patch release' section [1].
> >>>>>>>>
> >>>>>>>> You are right, I also think that this is not a critical hotfix
> >>> change
> >>>>>>>> that should be included into the 1.3.1 patch release.
> >>>>>>>>
> >>>>>>>> Thus I'm not making any further efforts to bring it in.
> >>>>>>>>
> >>>>>>>> Best
> >>>>>>>> Anton
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <[email protected]
> >>> :
> >>>>>>>>
> >>>>>>>>> Hi Anton,
> >>>>>>>>>
> >>>>>>>>> Thanks for looking into the MKL-DNN PR.
> >>>>>>>>>
> >>>>>>>>> As my understanding of cwiki (
> >>>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> >>>>>>>>> ),
> >>>>>>>>> these features will go into 1.4 rather than patch release of
> >>> 1.3.1.
> >>>>>>>>>
> >>>>>>>>> Feel free to correct me :)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> --Patric
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Anton Chernov [mailto:[email protected]]
> >>>>>>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> >>>>>>>>>> To: [email protected]
> >>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> >> 1.3.1
> >>>>> patch
> >>>>>>>>>> release
> >>>>>>>>>>
> >>>>>>>>>> It seems that there is a problem porting following changes to
> >> the
> >>>>>>>>> v1.3.x
> >>>>>>>>>> release branch:
> >>>>>>>>>>
> >>>>>>>>>> Implement mkldnn convolution fusion and quantization
> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>
> >>>>>>>>>> MKL-DNN Quantization Examples and README
> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>>>>>>
> >>>>>>>>>> The bases are different.
> >>>>>>>>>>
> >>>>>>>>>> I would need help from authors of these changes to make a
> >>> backport
> >>>>> PR.
> >>>>>>>>>>
> >>>>>>>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
> >>> create
> >>>>> the
> >>>>>>>>>> corresponding PR's?
> >>>>>>>>>>
> >>>>>>>>>> Without proper history and domain knowledge I would not be able
> >>> to
> >>>>>>>>> create
> >>>>>>>>>> them by my own in reasonable amount of time, I'm afraid.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Anton
> >>>>>>>>>>
> >>>>>>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <
> >> [email protected]
> >>>> :
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> As part of:
> >>>>>>>>>>>
> >>>>>>>>>>> Implement mkldnn convolution fusion and quantization
> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>>
> >>>>>>>>>>> I propose to add the examples and documentation PR as well:
> >>>>>>>>>>>
> >>>>>>>>>>> MKL-DNN Quantization Examples and README
> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Anton
> >>>>>>>>>>>
> >>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <
> >> [email protected]
> >>>> :
> >>>>>>>>>>>
> >>>>>>>>>>>> Dear MXNet community,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will be the release manager for the upcoming 1.3.1 patch
> >>>> release.
> >>>>>>>>>>>> Naveen will be co-managing the release and providing help
> >> from
> >>>> the
> >>>>>>>>>>>> committers side.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The following dates have been set:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Code Freeze: 31st October 2018
> >>>>>>>>>>>> Release published: 13th November 2018
> >>>>>>>>>>>>
> >>>>>>>>>>>> Release notes have been drafted here [1].
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Known issues
> >>>>>>>>>>>>
> >>>>>>>>>>>> Update MKL-DNN dependency
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> >>>>>>>>>>>>
> >>>>>>>>>>>> This PR hasn't been merged even to master yet. Requires
> >>>> additional
> >>>>>>>>>>>> discussion and merge.
> >>>>>>>>>>>>
> >>>>>>>>>>>> distributed kvstore bug in MXNet
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>>>>>>>
> >>>>>>>>>>>>> When distributed kvstore is used, by default gluon.Trainer
> >>>> doesn't
> >>>>>>>>>>>>> work
> >>>>>>>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1
> >> GPU.
> >>> To
> >>>>> be
> >>>>>>>>>>>> more specific, the trainer updates once per GPU, the
> >>> LRScheduler
> >>>>>>>>>>>> object is shared across GPUs and get a wrong update count.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This needs to be fixed. [6]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Changes
> >>>>>>>>>>>>
> >>>>>>>>>>>> The following changes will be ported to the release branch,
> >> per
> >>>>> [2]:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-953] Fix oob memory read
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-922] Fix memleak in profiler
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> >>>>>>>>>>>>
> >>>>>>>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
> >>> Graph
> >>>>>>>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> >>>>>>>>>> proposal
> >>>>>>>>>>>> [4])
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>>>
> >>>>>>>>>>>> Following items (test cases) should be already part of 1.3.0:
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-489] MKLDNN Pool test
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-498] Test MKLDNN backward operators
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> >>>>>>>>>>>>
> >>>>>>>>>>>> Set correct update on kvstore flag in dist_device_sync mode
> >> (as
> >>>>> part
> >>>>>>>>>>>> of fixing [5])
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> >>>>>>>>>>>>
> >>>>>>>>>>>> upgrade mshadow version
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> >>>>>>>>>>>> But another PR will be used instead:
> >>>>>>>>>>>> update mshadow
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> >>>>>>>>>>>>
> >>>>>>>>>>>> CudnnFind() usage improvements
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> >>>>>>>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> >>>>>>>>>>>> addresses this memory leak issue. This is an important fix to
> >>>>>>>>> include
> >>>>>>>>>>>> in 1.3.1
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> From discussion about gluon toolkits:
> >>>>>>>>>>>>
> >>>>>>>>>>>> disable opencv threading for forked process
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fix lazy record io when used with dataloader and multi_worker
> >>>> 0
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> >>>>>>>>>>>>
> >>>>>>>>>>>> fix potential floating number overflow, enable float16
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Resolved issues
> >>>>>>>>>>>>
> >>>>>>>>>>>> MxNet 1.2.1–module get_outputs()
> >>>>>>>>>>>>
> >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> >>>>>>>>>>>>
> >>>>>>>>>>>> As far as I can see from the comments the issue has been
> >>>> resolved,
> >>>>>>>>> no
> >>>>>>>>>>>> actions need to be taken for this release. [7] is mentioned
> >> in
> >>>> this
> >>>>>>>>>>>> regards, but I don't see any action points here either.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will start with help of Naveen port the mentioned PR's to
> >> the
> >>>>>>>>> 1.3.x
> >>>>>>>>>>>> branch.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Anton
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> >>>>>>>>>>>> or+next+MXNet+Release [3]
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> >>>>>>>>>>>> [4]
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>
> >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> >>>>>>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> >>>>>>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>>>>>>> [6]
> >>>>>>>>>>>> https://github.com/apache/incubator-
> >>>>>>>>>> mxnet/issues/12713#issuecomment-4
> >>>>>>>>>>>> 35773777 [7]
> >>>> https://github.com/apache/incubator-mxnet/pull/11005
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Reply via email to