The layout transformation should really be a separate optimization pass rather than memory planning. As is done in the TVM stack. If we want to do a clean slate solution, I would recommend looking into that instead.
TIanqi On Tue, Apr 9, 2019 at 1:46 AM Lv, Tao A <[email protected]> wrote: > > > Hi dev, > > > > As we're discussing the roadmap for MXNet 2.0, I would like to start a > thread about refining the InferStorageType and memory planning pass in > MXNet and hope it can happen as a part of the 2.0 release. > > > > Thanks to @eric-haibin-lin, part of the proposal has already been > discussed in issue #13598 [1]. > > > > As mentioned in the description of issue #13598, there are several > drawbacks of the existing flow. Please allow me to quote them here: > * the selection of MKL/CPU/GPU/CUDNN implementation happens after > graph attribute inference and memory planning, memory planning is thus not > aware of the implementation that will be used for execution in the future, > which may result in sub-optimal result. For example, the memory inplace > option may vary depending on the accelerator backend (the new version of > CUDNN enables x/dx inplace for _backward_conv). > * some sparse operator need to access dtype/shape information to > decide which implementation to invoke for execution, and whether to perform > fallback. This information is not yet exposed in the existing infer storage > type interface. > > > > Besides, the existing memory planning pass calculates and afterwards > allocates memory strictly according to the input/output tensor shapes > (which can be got from operators' arithmetic formulas through InferShape). > That's not true anymore when we come to accelerators like MKL-DNN on CPU > which wants to pad input/output tensor to optimal formats (eg. nchw16c) > according to hardware architecture. It also can be described as shape + > stride. As many of you know, MKL-DNN shows great performance on these > optimal formats which is blocked by the vector length of AVX512 or AVX2. > It's very natural for us to pad on the channel dimension for those > inputs/outputs which IC or OC is not multiples of vector length and > leverage optimal kernels for blocked formats. Unfortunately this cannot be > implemented without changing the logic in the memory planning pass. > Currently we always fallback to slow reference kernels for both convolution > [1] and deconvolution [2]. > > > > AFAIK, the padding feature of MKL-DNN has already been used in TensorFlow > and other frameworks. We also found that, without supporting this feature, > many other new features from MKL-DNN cannot be applied to MXNet, such as > the deconvolution primitive, winograd, etc. > > > > Changes for this proposal can be divided into following parts: > 1. Following the proposal in issue #13598, we need add new > InferStorageTypeEx functions to operators which need to do dispatch in a > more fine-grained way. This also need the InfereStorage pass can handle the > new -Ex function as what we did for FCompute and FComputeEx. > 2. Attach more information to the computation graph/node, eg. > accelerator specific information. Currently we add `IsMKLDNN` directly > during operator registration if MXNET_USE_MKLDNN == 1. It looks simple and > rude to me. > 3. Do memory planning according to more information: topology, > shapes, data types, in-place options and more accurate accelerator > information (accelerator path, memory size requirements, accelerator-wise > attributes). > 4. Improve MKL-DNN operators so they can work on those well planned > memory which may be larger than the arithmetic requirements and work with > optimal kernels. Also, with more accurate dispatching in > InferStorageTypeEx, there is no need for us to write complicated fallback > logic in MKL-DNN operators. > 5. If users feel uncomfortable with more memory usage, we can disable > this feature by environmental variables. > > > > Since the memory planning pass is implemented in NNVM, so we also need > support from TVM community. > > > > Please let me know what do you think. Thank you. > > > > -tao > > > > [1] https://github.com/apache/incubator-mxnet/issues/13598 > > [2] > https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_convolution.cc#L194 > > [3] > https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_deconvolution.cc#L55 > >
