Reports

Okay, let me use an extended metaphor to explain the "landscape" metaphor.

So, when you see a 2D graph, you can visualize the line as being the value of f(x) as you vary x, and when you see a 3D graph or contour map, that's f(x,z) as x and z are varied.

Why did I bring this up? Well, when we're talking about the loss function, while it can be calculated from your output and the ground truth, the theory is that the "true" loss function is something that is a result of all of the parameters of your neural network, that is, it's a function that is f(x1,x2,x3,..,xn). This is to say, you're actually trying to do gradient descent on a hyperdimensional landscape that has as many dimensions as your neural network has variables (the parameters and the hyperparameters) that can be adjusted. It would be literally impossible to visualize. We can conceive of a "saddle point" or a "local minimum" by analogy with 2D or 3D space, but that's not actually what's going on here, it's more sort of like... an area that's exerting gravity on your model, pulling it via gradient descent towards it?

When you backpropagate, the algorithm figures out where to move in that hyperdimensional space towards that area of gravitic influence, which, due to its limited "vision" there is no actual "empirical" way of knowing whether it is a global minimum or not. And this is for the model that "sees" the whole vector.

You have no chance of perceiving the space that your model is traversing in. The landscape metaphor might lead you to believe that there is a way of seeing it, but you're more or less blind and feeling your way through. That's just how it is.

I mean, the analogy still holds that this is a landscape because if you think about it, the reason we can see anything is because of light. There is nothing requiring light to exist for a notion of a landscape to exist, so you can easily have a space that can be traversed but cannot be seen, if that makes sense?

79774405