I bumped into the definition of the softrelu gradient:

https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L170

Which is defined as  1- exp(-x)

As we define the forward of the softrelu as the softplus function,
shouldn't the gradient be the logistic function?

Is my understanding that the gradient of the softrelu should go down
to zero as Lim x -> -Inf  Which is not the case with the above
definition which goes to -Inf as Lim x- > -Inf

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)


Pedro.

Reply via email to