[apache/incubator-mxnet] [RFC] [Gluon] Accumulating loss in the forward phase (#17004)

Xi Wang Sat, 07 Dec 2019 05:36:15 -0800

## Description

In `tf.keras`, users could call `add_loss` method to create some non-standard 
loss function (when I say standard, I mean loss function that takes parameters 
other than `y_true` and `y_pred`), e.g. loss function that involves the input.


https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer#add_loss

A practical example would be Bayesian Neural Network:
```python
model = tf.keras.Sequential([
      tfp.layers.DenseReparameterization(512, activation=tf.nn.relu),
      tfp.layers.DenseReparameterization(10),
  ])
logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
      labels=labels, logits=logits)
kl = sum(model.losses)
loss = neg_log_likelihood + kl
train_op = tf.train.AdamOptimizer().minimize(loss)
```
source: 
https://github.com/tensorflow/probability/blob/r0.8/tensorflow_probability/python/layers/dense_variational.py#L356

In this case, the loss is composed of two parts: classification error and the 
loss inside `DenseReparameterization`(which is the KL divergence between the 
posterior and prior of weights in each layer)(i.e. model.losses). This is 
achieved by utilizing `add_loss` method.

_______________________________
However, this feature is currently not supported by Gluon.

In order to implement it, I tired the following code :
```python
class StochasticBlock(nn.HybridBlock):
  def __init__(self):
    super(StochasticBlock, self).__init__()
    self._losses = []

  def add_loss(self, loss):
    self._losses.append(loss)

  @property
  def losses(self):
    collected_losses = []
    collected_losses.extend(self._losses)
    for child in self._children.values():
      if hasattr(child, '_losses'):
        collected_losses.extend(getattr(child, '_losses'))
    return collected_losses

class DiagGaussian(StochasticBlock):
  def __init__(self):
    super(DiagGaussian, self).__init__()

  def hybrid_forward(self, F, loc, scale):
    log_variance = F.np.log(1e-20 + scale ** 2)
    KL = 0.5 * F.np.sum(1 + log_variance - loc ** 2 - F.np.exp(log_variance), 
axis=1)
    self.add_loss(KL)
    return (F.np.random.normal(loc, scale))

diagGaussian = DiagGaussian()
loc = np.random.uniform(-10, 10, size=(2,2))
scale = np.random.uniform(size=(2,2))
diagGaussian.hybridize()
print(diagGaussian(loc, scale))
print(diagGaussian.losses[0])
```
It worked well, if not turning hybridize, otherwise the `losses[0]` would 
become `<_Symbol diaggaussian0_multiply_scalar0>` instead of some concrete 
value.

I am actively looking for other solutions to this problem, a potential 
workaround would be forcing `losses` to be one of the block's output. Not sure 
if it is gonna work in `Sequential`, it's also super not elegant =_=

Having this feature could bring huge convenience for the implementation of deep 
generative models (such as VAE.）

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17004

[apache/incubator-mxnet] [RFC] [Gluon] Accumulating loss in the forward phase (#17004)

Reply via email to