Towards demystifying neural networks: Optimization, robustness and denoising
Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Such neural networks in principle have the capacity to (over)fit any set of labels including significantly corrupted ones. Despite this (over)fitting capacity overparameterized networks have an intriguing robustness capability: they are surprisingly robust to label noise when first order methods with early stopping is used to train them. Even more surprising, one can remove noise and corruption from a natural image without using any training data what-so-ever, by simply fitting (via gradient descent) a randomly initialized, over-parameterized convolutional generator to a single corrupted image. In this talk I will first discuss our recent results proving that overparameterized neural networks can indeed fit any labels. Then I will present theoretical results aimed at demystifying their robustness and denoising capabilities when trained via early-stopped gradient descent.