You have the two trainable parameters from the network, weight and bias, and there are two non-trainable params from the optimizer. This link explains it pretty well.