Overparametrized systems: from Smale's 17th problem to two-layer neural networks

Speaker

Stanford University

Host

Kuikui Liu

LIDS, EECS

Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.

In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.

I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.

I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting.

[Based on joint papers with Eliran Subag and with Pierfrancesco Urbani]

Add to Calendar 2025-03-11 16:15:00 2025-03-11 17:15:00 America/New_York Overparametrized systems: from Smale's 17th problem to two-layer neural networks Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting. [Based on joint papers with Eliran Subag and with Pierfrancesco Urbani] TBD

Organizer & Contact

Olivia Cheo

olivia@csail.mit.edu

Part of

Theory of Computation (ToC) Seminar 2024 - 2025

Overparametrized systems: from Smale's 17th problem to two-layer neural networks

Speaker

Host

March 11 2025

Location

Organizer & Contact

Part of

March 18

Rapid Mixing at the Uniqueness Threshold

March 04

Pseudorandom Correlation Generators

Overparametrized systems: from Smale's 17th problem to two-layer neural networks

Speaker

Host

March 11 2025

Location

Organizer & Contact

Part of

Related Events

March 18

Rapid Mixing at the Uniqueness Threshold

March 04

Pseudorandom Correlation Generators