The layout transformation should really be a separate optimization pass
rather than memory planning. As is done in the TVM stack. If we want to do
a clean slate solution, I would recommend looking into that instead.

TIanqi

On Tue, Apr 9, 2019 at 1:46 AM Lv, Tao A <[email protected]> wrote:

>
>
> Hi dev,
>
>
>
> As we're discussing the roadmap for MXNet 2.0, I would like to start a
> thread about refining the InferStorageType and memory planning pass in
> MXNet and hope it can happen as a part of the 2.0 release.
>
>
>
> Thanks to @eric-haibin-lin, part of the proposal has already been
> discussed in issue #13598 [1].
>
>
>
> As mentioned in the description of issue #13598, there are several
> drawbacks of the existing flow. Please allow me to quote them here:
> *        the selection of MKL/CPU/GPU/CUDNN implementation happens after
> graph attribute inference and memory planning, memory planning is thus not
> aware of the implementation that will be used for execution in the future,
> which may result in sub-optimal result. For example, the memory inplace
> option may vary depending on the accelerator backend (the new version of
> CUDNN enables x/dx inplace for _backward_conv).
> *        some sparse operator need to access dtype/shape information to
> decide which implementation to invoke for execution, and whether to perform
> fallback. This information is not yet exposed in the existing infer storage
> type interface.
>
>
>
> Besides, the existing memory planning pass calculates and afterwards
> allocates memory strictly according to the input/output tensor shapes
> (which can be got from operators' arithmetic formulas through InferShape).
> That's not true anymore when we come to accelerators like MKL-DNN on CPU
> which wants to pad input/output tensor to optimal formats (eg. nchw16c)
> according to hardware architecture. It also can be described as shape +
> stride. As many of you know, MKL-DNN shows great performance on these
> optimal formats which is blocked by the vector length of AVX512 or AVX2.
> It's very natural for us to pad on the channel dimension for those
> inputs/outputs which IC or OC is not multiples of vector length and
> leverage optimal kernels for blocked formats. Unfortunately this cannot be
> implemented without changing the logic in the memory planning pass.
> Currently we always fallback to slow reference kernels for both convolution
> [1] and deconvolution [2].
>
>
>
> AFAIK, the padding feature of MKL-DNN has already been used in TensorFlow
> and other frameworks. We also found that, without supporting this feature,
> many other new features from MKL-DNN cannot be applied to MXNet,  such as
> the deconvolution primitive, winograd, etc.
>
>
>
> Changes for this proposal can be divided into following parts:
> 1.      Following the proposal in issue #13598, we need add new
> InferStorageTypeEx functions to operators which need to do dispatch in a
> more fine-grained way. This also need the InfereStorage pass can handle the
> new -Ex function as what we did for FCompute and FComputeEx.
> 2.      Attach more information to the computation graph/node, eg.
> accelerator specific information. Currently we add `IsMKLDNN` directly
> during operator registration if MXNET_USE_MKLDNN == 1. It looks simple and
> rude to me.
> 3.      Do memory planning according to more information: topology,
> shapes, data types, in-place options and more accurate accelerator
> information (accelerator path, memory size requirements, accelerator-wise
> attributes).
> 4.      Improve MKL-DNN operators so they can work on those well planned
> memory which may be larger than the arithmetic requirements and work with
> optimal kernels. Also, with more accurate dispatching in
> InferStorageTypeEx, there is no need for us to write complicated fallback
> logic in MKL-DNN operators.
> 5.      If users feel uncomfortable with more memory usage, we can disable
> this feature by environmental variables.
>
>
>
> Since the memory planning pass is implemented in NNVM, so we also need
> support from TVM community.
>
>
>
> Please let me know what do you think. Thank you.
>
>
>
> -tao
>
>
>
> [1] https://github.com/apache/incubator-mxnet/issues/13598
>
> [2]
> https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_convolution.cc#L194
>
> [3]
> https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_deconvolution.cc#L55
>
>

Reply via email to