> This seems to be a big change to the existing operator mode (imperative and
> symbolic).
Essentially the motivation for deferred compute is to extend imperative mode to
enable users to "construct a symbol" without using symbolic API. This addresses
confusion around having two APIs and prevents divergence between imperative and
symbolic APIs. There's no need to drop the existing imperative / symbolic APIs
due to deferred compute.
> Could you please provide more information.
Please ask a question and I'll answer ;)
> AFAIK, symbolic API already does deferred init, imperative API is provided to
> improve user experience. Based on this RFC, what's the advantage of this new
> deferred_compute mode? As a user, when should I use it or not.
Based on deferred compute we can simplify `gluon.HybridBlock` API so that it
matches the `gluon.Block` API. For example, consider you'd like to reimplement
`Dense(HybridBlock)` based on extended `HybridBlock` API based on deferred
compute:
``` python
class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
dtype='float32', weight_initializer=None,
bias_initializer='zeros',
in_units=0):
super().__init__()
self._flatten = flatten
self._units = units
self.weight = gluon.Parameter(shape=(units, in_units),
init=weight_initializer, dtype=dtype,
allow_deferred_init=True)
if use_bias:
self.bias = gluon.Parameter(shape=(units,),
init=bias_initializer, dtype=dtype,
allow_deferred_init=True)
else:
self.bias = None
def forward(self, x): # We allow users to overwrite forward() directly.
ctx = x.context
return npx.FullyConnected(x, self.weight.data(ctx), self.bias.data(ctx),
no_bias=bias is None, num_hidden=self._units,
flatten=self._flatten, name='fwd')
```
`HybridBlock` can wrap the execution of `forward` into a deferred compute
session and obtain a symbolic representation of the computation and pass it to
`CachedOp`.
There would be no reason for users to explicitly use the API.
> Another question. We all know deferred init cause bad user experience when it
> comes to debugging. Would this RFC address the debuggability issue?
This RFC is orthogonal to deferred init. When updating `gluon.HybridBlock` API
based on deferred compute, one option is to require statically known shapes of
weights at construction time **if** users implement `def forward`. For
backwards compatibility we likely want to keep deferred init around for
existing code relying on `mx.sym` and implementing `def hybrid_forward`.
However, the other option is to allow deferred initialization of weights and
require users to implement `infer_shape`:
https://github.com/apache/incubator-mxnet/blob/910c608f682a47fc2c43375b5f5a426b563e5821/python/mxnet/gluon/block.py#L1073-L1075
This works around the failures of symbolic shape inference for deferred init in
case of dynamic shape ops, while still allowing users to decide the shape of
weight at first forward.
In the example above, it could look like:
``` python
class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
dtype='float32', weight_initializer=None,
bias_initializer='zeros',
in_units=0):
[...]
def infer_shape(self, x):
self.weight.shape = (self.weight.shape[0], x.shape[1])
def forward(self, x):
[...]
```
> If it's about performance optimization, could we have some initial data of
> using this new deferred mode vs. existing imperative mode?
There is the option to improve performance of imperative mode by deferring the
computation and optimizing the computational graph before performing the
computation. But this is not the main motivation and I haven't optimized for
this use-case (yet). In the `gluon.HybridBlock` case, we only run with deferred
compute once to construct the symbolic graph and then pass over to `CachedOp`
for optimized execution.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579529593