With regards to the first half of the question (why the "arithmetic mean" -- the average -- is not used):
tf.fit
uses the output of those tf.keras.losses.*
functions for tf.keras.optimizers.*
, which use the original vector (the whole set of values).
Thus, the name MeanSquaredError
refers to the training formula as a whole (which is for regression loops, as opposed to categorical_crossentropy
, which is for classification loops), not the literal result of this specific step of the outer tf.fit
loop.
For the second half of the question, will defer to https://stackoverflow.com/a/70296585/24473928