Reports

Answer from ChatGPT: When samples are "strictly linearly separable", the (unregularized) cost function of logistic regression does not actually have a "finite global minimum".

If the data is linearly separable, then ||w|| (the norm of the parameter) can be continuously enlarged so that each sample is judged more and more "confidently" --- w.dot(X[i]) + b -> +Infinity for positive samples, w.dot(X[i]) + b -> -Infinity for negative samples, so that each sigmoid( w.dot(X[i]) + b ) approaches 1 (positive sample) or 0 (negative sample), and the cross entropy loss of each sample approaches 0.

So I customized two sets of data, they are all points on 2D plane, and the one is linear separable, the other is not linear separable. It turns out that the cost function of linear inseparable one indeed converges, while the separable one keeps descending and descending.

79656154