Dependence Induced Representation Learning

Speaker

Xiangxiang Xu

EECS/RLE, MIT

Host

Sharut Gupta

MIT CSAIL

Abstract:
Despite the vast progress in deep learning practice, theoretical understandings of learned feature representations remain limited. In this talk, we discuss three fundamental questions from a unified statistical perspective:
(1) What representations carry useful information?
(2) How are representations learned from distinct algorithms related?
(3) Can we separate representation learning from solving specific tasks?
We formalize representations that extract statistical dependence from data, termed dependence-induced representations. We prove that representations are dependence-induced if and only if they can be learned from specific features defined by Hirschfeld–Gebelein–Rényi (HGR) maximal correlation. This separation theorem signifies the key role of HGR features in representation learning and enables a modular design of learning algorithms. Specifically, we demonstrate the optimality of HGR features in simultaneously achieving different design objectives, including minimal sufficiency (Tishby's information bottleneck), information maximization, enforcing uncorrelated features (VICReg), and encoding information at various granularities (Matryoshka representation learning). We further illustrate that by adapting HGR features, we can obtain representations learned by distinct practices, from cross-entropy or hinge loss minimization, non-negative feature learning, and neural density ratio estimators to their regularized variants. We also discuss the applications of our analyses in interpreting learning phenomena such as neural collapse, understanding existing self-supervised learning practices, and obtaining more flexible designs, e.g., inference-time hyperparameter tuning.

Bio: Xiangxiang Xu received the B.Eng. and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China, in 2014 and 2020, respectively. He is a postdoctoral associate in the Department of EECS at MIT. His research focuses on information theory, statistical learning, representation learning, and their applications in understanding and developing learning algorithms. He is a recipient of the 2016 IEEE PES Student Prize Paper Award in Honor of T. Burke Hayes and the 2024 ITA (Information Theory and Applications) Workshop Sand Award.

Add to Calendar 2024-11-18 16:00:00 2024-11-18 17:00:00 America/New_York Dependence Induced Representation Learning Abstract: Despite the vast progress in deep learning practice, theoretical understandings of learned feature representations remain limited. In this talk, we discuss three fundamental questions from a unified statistical perspective:(1) What representations carry useful information?(2) How are representations learned from distinct algorithms related?(3) Can we separate representation learning from solving specific tasks?We formalize representations that extract statistical dependence from data, termed dependence-induced representations. We prove that representations are dependence-induced if and only if they can be learned from specific features defined by Hirschfeld–Gebelein–Rényi (HGR) maximal correlation. This separation theorem signifies the key role of HGR features in representation learning and enables a modular design of learning algorithms. Specifically, we demonstrate the optimality of HGR features in simultaneously achieving different design objectives, including minimal sufficiency (Tishby's information bottleneck), information maximization, enforcing uncorrelated features (VICReg), and encoding information at various granularities (Matryoshka representation learning). We further illustrate that by adapting HGR features, we can obtain representations learned by distinct practices, from cross-entropy or hinge loss minimization, non-negative feature learning, and neural density ratio estimators to their regularized variants. We also discuss the applications of our analyses in interpreting learning phenomena such as neural collapse, understanding existing self-supervised learning practices, and obtaining more flexible designs, e.g., inference-time hyperparameter tuning.Bio: Xiangxiang Xu received the B.Eng. and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China, in 2014 and 2020, respectively. He is a postdoctoral associate in the Department of EECS at MIT. His research focuses on information theory, statistical learning, representation learning, and their applications in understanding and developing learning algorithms. He is a recipient of the 2016 IEEE PES Student Prize Paper Award in Honor of T. Burke Hayes and the 2024 ITA (Information Theory and Applications) Workshop Sand Award.

Organizer & Contact

Sharut Gupta

sharut@mit.edu

Part of

ML Tea

Dependence Induced Representation Learning

Speaker

Host

November 18 2024

Location

Organizer & Contact

Part of

March 17

ML Tea: Aggregating fMRI datasets for training brain-optimized models of human vision

March 10

ML Tea: Unsupervised Discovery of Interpretable Structure in Complex Systems

Dependence Induced Representation Learning

Speaker

Host

November 18 2024

Location

Organizer & Contact

Part of

Related Events

March 17

ML Tea: Aggregating fMRI datasets for training brain-optimized models of human vision

March 10

ML Tea: Unsupervised Discovery of Interpretable Structure in Complex Systems