A criterion for early stopping in SGD for binary classification
We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification with homogeneous linear predictors. Our theoretical results support the effectiveness of our stopping criterion when the data is Gaussian distributed. This presence of noise allows for the possibility of non-separable data. We show that our test terminates in a finite number of iterations and when the noise in the data is not too large, the expected classifier at termination nearly minimizes the probability of misclassification. Finally, numerical experiments indicate for both real and synthetic data sets that our termination test exhibits a good degree of predictability on accuracy and running time.
Joint work with Sina Baghal (Waterloo) and Courtney Paquette (Waterloo & Google)