You can find a derivation of the logistic function from a probabilistic perspective here. I think the main source of your confusion is that you can interpret it as a probability, but that doesn't automatically mean that you should. It is appropriate to treat it as a probability if and only if the argument you pass to it is interpretable as the log odds (logit), $\log(\frac{P(X)}{1 - P(X)})$, of some event $X$.
As for your question regarding sigmoid or softmax, they are actually equivalent, at least in a neural network setting. You can see this in the structure of the formula for binary softmax: $\text{softmax}(x)[0] = \frac{e^{x_0}}{e^{x_0} + e^{x_1}} = \frac{1}{1 + e^{x_1 - x_0}} = \text{sigmoid}(x_1 - x_0)$.