Reports

By changing the hidden layers from relu to sigmoid, you ensure that each layer applies a nonlinear transformation over the entire input range. With relu, there is the possibility that the model enters a regime where a large portion of the neurons fire linearly (for example, if the values are all in the positive region, relu basically behaves like the identity function). This can lead to the model, in practice, behaving almost linearly, especially if the initialization of the weights and the distribution of the data results in a saturation of the neurons in a linear region of the relu.

In contrast, sigmoid always introduces curvature (nonlinearity), compressing the output values to a range between 0 and 1. This makes it difficult for the network to stagnate in linear behavior, since even with subtle changes in the weights, the sigmoid function maintains a non-linear mapping between input and output.

79263214