Answer from ChatGPT: When samples are "strictly linearly separable", the (unregularized) cost function of logistic regression does not actually have a "finite global minimum".
If the data is linearly separable, then ||w||
(the norm of the parameter) can be continuously enlarged so that each sample is judged more and more "confidently" --- w.dot(X[i]) + b -> +Infinity
for positive samples, w.dot(X[i]) + b -> -Infinity
for negative samples, so that each
sigmoid( w.dot(X[i]) + b )
approaches 1 (positive sample) or 0 (negative sample), and the cross entropy loss of each sample approaches 0.
So I customized two sets of data, they are all points on 2D plane, and the one is linear separable, the other is not linear separable. It turns out that the cost function of linear inseparable one indeed converges, while the separable one keeps descending and descending.