With regards to the first half of the question (why the "arithmetic mean" -- the average -- is not used):
tf.fit uses the output of those tf.keras.losses.* functions for tf.keras.optimizers.* , which use the original vector (the whole set of values).
Thus, the name MeanSquaredError refers to the training formula as a whole (which is for regression loops, as opposed to categorical_crossentropy , which is for classification loops), not the literal result of this specific step of the outer tf.fit loop.
For the second half of the question, will defer to https://stackoverflow.com/a/70296585/24473928