Hi Anton, Thanks for helping the release. The following PRs are needed by customers who want to use deterministic CUDNN convolution algorithms:
https://github.com/apache/incubator-mxnet/pull/12992 https://github.com/apache/incubator-mxnet/pull/13049 Thanks! Lin On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <[email protected]> wrote: > Hi Anton, > I have the following suggestions for fixes to include in 1.3.1. These each > have updates to files that will impact docs generation for the 1.3.x > version of the website's Python API docs: > > https://github.com/apache/incubator-mxnet/pull/12879 > https://github.com/apache/incubator-mxnet/pull/12871 > https://github.com/apache/incubator-mxnet/pull/12856 > > Thanks, > Aaron > > On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <[email protected]> wrote: > > > Hi Anton, > > > > Thanks for driving this, I would like to include the following fix in > > 1.3.1: > > Allow infer shape partial on foreach operator: > > https://github.com/apache/incubator-mxnet/pull/12471 > > > > Keras-MXNet needs this functionality to infer shape partially > > on foreach operator. (Used in RNN operators) > > > > Thanks a lot! > > > > > > Best Regards > > Lai Wei > > > > > > > > On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <[email protected]> > > wrote: > > > > > Hi Naveen and Anton, > > > > > > Thanks for pointing that out. You are right that these are not critical > > > fixes. Putting them in 1.4.0 is more appropriate. PRs are closed. > > > > > > Best, > > > Haibin > > > > > > On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <[email protected]> > wrote: > > > > > > > Please note that this is a patch release(1.3.1) to address critical > > > bugs!, > > > > For everything else please wait for 1.4.0 which is planned very > shortly > > > > after 1.3.1 > > > > > > > > > On Nov 6, 2018, at 7:17 AM, Anton Chernov <[email protected]> > > wrote: > > > > > > > > > > The following PR's have been created so far: > > > > > > > > > > Infer dtype in SymbolBlock import from input symbol (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13117 > > > > > > > > > > [MXNET-953] Fix oob memory read (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13118 > > > > > > > > > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13119 > > > > > > > > > > [MXNET-922] Fix memleak in profiler (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13120 > > > > > > > > > > Set correct update on kvstore flag in dist_device_sync mode > (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13121 > > > > > > > > > > update mshadow (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13122 > > > > > > > > > > CudnnFind() usage improvements (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13123 > > > > > > > > > > Fix lazy record io when used with dataloader and multi_worker > 0 > > > > (v1.3.x) > > > > > https://github.com/apache/incubator-mxnet/pull/13124 > > > > > > > > > > > > > > > As stated previously I would be rather opposed to have following > PR's > > > it > > > > in > > > > > the patch release: > > > > > > > > > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x > > > > > https://github.com/apache/incubator-mxnet/pull/13129 > > > > > > > > > > sample_like operators (#13034) v1.3.x > > > > > https://github.com/apache/incubator-mxnet/pull/13130 > > > > > > > > > > > > > > > Best > > > > > Anton > > > > > > > > > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <[email protected]>: > > > > > > > > > >> Hi Haibin, > > > > >> > > > > >> I have a few comments regarding the proposed performance > improvement > > > > >> changes. > > > > >> > > > > >> CUDNN support for LSTM with projection & clipping > > > > >> https://github.com/apache/incubator-mxnet/pull/13056 > > > > >> > > > > >> There is no doubt that this change brings value, but I don't see > it > > > as a > > > > >> critical bug fix. I would rather leave it for the next major > > release. > > > > >> > > > > >> sample_like operators > > > > >> https://github.com/apache/incubator-mxnet/pull/13034 > > > > >> > > > > >> Even if it's related to performance, this is an addition of > > > > functionality > > > > >> and I would also push this to be in the next major release only. > > > > >> > > > > >> > > > > >> Best > > > > >> Anton > > > > >> > > > > >> > > > > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <[email protected]>: > > > > >> > > > > >>> Hi Patric, > > > > >>> > > > > >>> This change was listed in the 'PR candidates suggested for > > > > consideration > > > > >>> for v1.3.1 patch release' section [1]. > > > > >>> > > > > >>> You are right, I also think that this is not a critical hotfix > > change > > > > >>> that should be included into the 1.3.1 patch release. > > > > >>> > > > > >>> Thus I'm not making any further efforts to bring it in. > > > > >>> > > > > >>> Best > > > > >>> Anton > > > > >>> > > > > >>> [1] > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates > > > > >>> > > > > >>> > > > > >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <[email protected] > >: > > > > >>> > > > > >>>> Hi Anton, > > > > >>>> > > > > >>>> Thanks for looking into the MKL-DNN PR. > > > > >>>> > > > > >>>> As my understanding of cwiki ( > > > > >>>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release > > > > >>>> ), > > > > >>>> these features will go into 1.4 rather than patch release of > > 1.3.1. > > > > >>>> > > > > >>>> Feel free to correct me :) > > > > >>>> > > > > >>>> Thanks, > > > > >>>> > > > > >>>> --Patric > > > > >>>> > > > > >>>>> -----Original Message----- > > > > >>>>> From: Anton Chernov [mailto:[email protected]] > > > > >>>>> Sent: Tuesday, November 6, 2018 3:11 AM > > > > >>>>> To: [email protected] > > > > >>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) > 1.3.1 > > > > patch > > > > >>>>> release > > > > >>>>> > > > > >>>>> It seems that there is a problem porting following changes to > the > > > > >>>> v1.3.x > > > > >>>>> release branch: > > > > >>>>> > > > > >>>>> Implement mkldnn convolution fusion and quantization > > > > >>>>> https://github.com/apache/incubator-mxnet/pull/12530 > > > > >>>>> > > > > >>>>> MKL-DNN Quantization Examples and README > > > > >>>>> https://github.com/apache/incubator-mxnet/pull/12808 > > > > >>>>> > > > > >>>>> The bases are different. > > > > >>>>> > > > > >>>>> I would need help from authors of these changes to make a > > backport > > > > PR. > > > > >>>>> > > > > >>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and > > create > > > > the > > > > >>>>> corresponding PR's? > > > > >>>>> > > > > >>>>> Without proper history and domain knowledge I would not be able > > to > > > > >>>> create > > > > >>>>> them by my own in reasonable amount of time, I'm afraid. > > > > >>>>> > > > > >>>>> Best regards, > > > > >>>>> Anton > > > > >>>>> > > > > >>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov < > [email protected] > > >: > > > > >>>>> > > > > >>>>>> > > > > >>>>>> As part of: > > > > >>>>>> > > > > >>>>>> Implement mkldnn convolution fusion and quantization > > > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12530 > > > > >>>>>> > > > > >>>>>> I propose to add the examples and documentation PR as well: > > > > >>>>>> > > > > >>>>>> MKL-DNN Quantization Examples and README > > > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12808 > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> Best regards, > > > > >>>>>> Anton > > > > >>>>>> > > > > >>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov < > [email protected] > > >: > > > > >>>>>> > > > > >>>>>>> Dear MXNet community, > > > > >>>>>>> > > > > >>>>>>> I will be the release manager for the upcoming 1.3.1 patch > > > release. > > > > >>>>>>> Naveen will be co-managing the release and providing help > from > > > the > > > > >>>>>>> committers side. > > > > >>>>>>> > > > > >>>>>>> The following dates have been set: > > > > >>>>>>> > > > > >>>>>>> Code Freeze: 31st October 2018 > > > > >>>>>>> Release published: 13th November 2018 > > > > >>>>>>> > > > > >>>>>>> Release notes have been drafted here [1]. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> * Known issues > > > > >>>>>>> > > > > >>>>>>> Update MKL-DNN dependency > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12953 > > > > >>>>>>> > > > > >>>>>>> This PR hasn't been merged even to master yet. Requires > > > additional > > > > >>>>>>> discussion and merge. > > > > >>>>>>> > > > > >>>>>>> distributed kvstore bug in MXNet > > > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/12713 > > > > >>>>>>> > > > > >>>>>>>> When distributed kvstore is used, by default gluon.Trainer > > > doesn't > > > > >>>>>>>> work > > > > >>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1 > GPU. > > To > > > > be > > > > >>>>>>> more specific, the trainer updates once per GPU, the > > LRScheduler > > > > >>>>>>> object is shared across GPUs and get a wrong update count. > > > > >>>>>>> > > > > >>>>>>> This needs to be fixed. [6] > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> * Changes > > > > >>>>>>> > > > > >>>>>>> The following changes will be ported to the release branch, > per > > > > [2]: > > > > >>>>>>> > > > > >>>>>>> Infer dtype in SymbolBlock import from input symbol [3] > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412 > > > > >>>>>>> > > > > >>>>>>> [MXNET-953] Fix oob memory read > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12631 > > > > >>>>>>> > > > > >>>>>>> [MXNET-969] Fix buffer overflow in RNNOp > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12603 > > > > >>>>>>> > > > > >>>>>>> [MXNET-922] Fix memleak in profiler > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12499 > > > > >>>>>>> > > > > >>>>>>> Implement mkldnn convolution fusion and quantization (MXNet > > Graph > > > > >>>>>>> Optimization and Quantization based on subgraph and MKL-DNN > > > > >>>>> proposal > > > > >>>>>>> [4]) > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12530 > > > > >>>>>>> > > > > >>>>>>> Following items (test cases) should be already part of 1.3.0: > > > > >>>>>>> > > > > >>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11371 > > > > >>>>>>> > > > > >>>>>>> [MXNET-489] MKLDNN Pool test > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11608 > > > > >>>>>>> > > > > >>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11831 > > > > >>>>>>> > > > > >>>>>>> [MXNET-546] Add unit test for MKLDNNSum > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11272 > > > > >>>>>>> > > > > >>>>>>> [MXNET-498] Test MKLDNN backward operators > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11232 > > > > >>>>>>> > > > > >>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/10921 > > > > >>>>>>> > > > > >>>>>>> Set correct update on kvstore flag in dist_device_sync mode > (as > > > > part > > > > >>>>>>> of fixing [5]) > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12786 > > > > >>>>>>> > > > > >>>>>>> upgrade mshadow version > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12692 > > > > >>>>>>> But another PR will be used instead: > > > > >>>>>>> update mshadow > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12674 > > > > >>>>>>> > > > > >>>>>>> CudnnFind() usage improvements > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12804 > > > > >>>>>>> A critical CUDNN fix that reduces GPU memory consumption and > > > > >>>>>>> addresses this memory leak issue. This is an important fix to > > > > >>>> include > > > > >>>>>>> in 1.3.1 > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> From discussion about gluon toolkits: > > > > >>>>>>> > > > > >>>>>>> disable opencv threading for forked process > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12025 > > > > >>>>>>> > > > > >>>>>>> Fix lazy record io when used with dataloader and multi_worker > > > 0 > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12554 > > > > >>>>>>> > > > > >>>>>>> fix potential floating number overflow, enable float16 > > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12118 > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> * Resolved issues > > > > >>>>>>> > > > > >>>>>>> MxNet 1.2.1–module get_outputs() > > > > >>>>>>> > https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882 > > > > >>>>>>> > > > > >>>>>>> As far as I can see from the comments the issue has been > > > resolved, > > > > >>>> no > > > > >>>>>>> actions need to be taken for this release. [7] is mentioned > in > > > this > > > > >>>>>>> regards, but I don't see any action points here either. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> I will start with help of Naveen port the mentioned PR's to > the > > > > >>>> 1.3.x > > > > >>>>>>> branch. > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> Best regards, > > > > >>>>>>> Anton > > > > >>>>>>> > > > > >>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ > > > > >>>>>>> [2] > > > > >>>>>>> > > > > >>>> > > > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f > > > > >>>>>>> or+next+MXNet+Release [3] > > > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849 > > > > >>>>>>> [4] > > > > >>>>>>> > > > > >>>>> > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz > > > > >>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN > > > > >>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713 > > > > >>>>>>> [6] > > > > >>>>>>> https://github.com/apache/incubator- > > > > >>>>> mxnet/issues/12713#issuecomment-4 > > > > >>>>>>> 35773777 [7] > > > https://github.com/apache/incubator-mxnet/pull/11005 > > > > >>>>>>> > > > > >>>>>>> > > > > >>>> > > > > >>> > > > > > > > > > >
