Back to Events

Seminar Series

March 10

Add to Calendar 2025-03-10 16:00:00 2025-03-10 17:00:00 America/New_York ML Tea: Unsupervised Discovery of Interpretable Structure in Complex Systems Speaker: Mark HamiltonAbstract: How does the human mind make sense of raw information without being taught how to see or hear? In this talk we will explore how to build algorithms that can uncover interpretable structure from large collections of unsupervised data like images and video. First, I will describe how to classify every pixel of a collection of images without any human annotations (Unsupervised semantic segmentation) by distilling self-supervised vision models. Second, we’ll see how this basic idea leads us to a new unifying theory of representation learning, and I will show how 20 different common machine learning methods such as dimensionality reduction, clustering, contrastive learning, and spectral methods emerge from a single unified equation. Finally, we’ll use this unified theory to create algorithms that can decode natural language just by watching unlabeled videos of people talking, without any knowledge of text. This work is the first step in our broader effort to translate animals using large scale, unsupervised, and interpretable learners, and the talk will conclude with some of our most recent efforts to analyze the complex vocalizations of Atlantic spotted dolphins.Bio: Mark Hamilton is a PhD student in William T Freeman's lab at the MIT Computer Science & Artificial Intelligence Laboratory. He is also a Senior Engineering Manager at Microsoft where he leads a team building a large-scale distributed ML products for Microsoft’s largest databases.  Mark is interested in how we can use unsupervised machine learning to discover scientific "structure" in complex systems. Mark values working on projects for social, cultural, and environmental good and aims to use his algorithms to help humans solve challenges they cannot solve alone. TBD

March 03

Add to Calendar 2025-03-03 16:00:00 2025-03-03 17:00:00 America/New_York ML Tea: Learning Generative Models from Corrupted Data Speaker: Giannis DarasAbstract: In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.Bio: Giannis Daras is a postdoctoral researcher at MIT working closely with Prof. Costis Daskalakis and Prof. Antonio Torralba. Prior to MIT, Giannis completed his Ph.D. at UT Austin, under the supervision of Prof. Alexandros G. Dimakis. Giannis is interested in generative modelling and the applications of generative models to inverse problems. A key aspect of his work involves developing algorithms for learning generative models from noisy data. His research has broad implications across various fields, including scientific applications, privacy and copyright concerns, and advancing data-efficient learning techniques. TBD

February 24

Add to Calendar 2025-02-24 16:00:00 2025-02-24 17:00:00 America/New_York MLTea: Score-of-Mixture Training: One-Step Generative Model Training via Score Estimation of Mixture Distributions Abstract: We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the α-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64×64 show that SMT/SMD are competitive with and can even outperform existing methods.Bio: Tejas is a final year PhD student in the Signals, Information and Algorithms Lab, advised by Professor Gregory Wornell. His research interests are centered around statistical inference, information theory and generative modeling with a recent focus on fundamental and applied aspects of score estimation and diffusion-based generative models. During his PhD, Tejas has interned at Meta AI, Google Research, Adobe Research and Mitsubishi Electric Research Labs. He is currently a recipient of the MIT Claude E. Shannon Fellowship.  TBD

February 19

Add to Calendar 2025-02-19 16:00:00 2025-02-19 17:00:00 America/New_York MLTea Talk: Theoretical Perspectives on Data Quality and Selection Abstract: Though the fact that data quality directly affects the quality of our prediction has always been understood, the large-scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.Bio: Abhishek Shetty is an incoming Catherine M. and James E. Allchin Early-Career Assistant Professor in the School of Computer Science at Georgia Tech and is currently FODSI Postdoctoral Fellow at MIT, hosted by Sasha Rakhlin, Ankur Moitra and Costis Daskalakis. He graduated from the department of EECS at UC Berkeley advised by Nika Haghtalab. His interests lie at the intersection of machine learning, theoretical computer science and statistics and is aimed at developing statistically and computationally efficient algorithms for inference. His research has been awarded with the Apple AI/ML fellowship and the American Statistical association SCGS best student paper. TBD

December 02

Truthfulness of Calibration Measures

Mingda Qiao
MIT CSAIL

Part Of

Add to Calendar 2024-12-02 16:00:00 2024-12-02 16:30:00 America/New_York Truthfulness of Calibration Measures Abstract: We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor. Bio: Mingda Qiao a FODSI postdoc hosted by Ronitt Rubinfeld at the MIT Theory of Computation (TOC) Group, and an incoming assistant professor at UMass Amherst (starting Fall'25). His research focuses on the theory of prediction, learning, and decision-making in sequential settings, as well as collaborative federated learning. Prior to MIT, Mingda was a FODSI postdoc at UC Berkeley, received his PhD in Computer Science from Stanford University, and received his BEng in Computer Science from Tsinghua University.

November 25

Add to Calendar 2024-11-25 16:00:00 2024-11-25 17:00:00 America/New_York Power of inclusion: Enhancing polygenic prediction with admixed individuals Zoom Link: https://mit.zoom.us/j/94204370795?pwd=eFZwYXVuWmVsQzE1UTRZN2VtY0lkUT09 with passcode 387975Abstract: Predicting heritable traits and genetic liability of disease from individuals’ genomes has important implications for tailoring medical prevention and intervention strategies in precision medicine. Polygenic score (PGS), a statistical approach, has recently attracted substantial attention due to its potential relevance in clinical practice. Admixed individuals offer unique opportunities for addressing limited transferability in PGSs. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals in developing more equitable PGS models.Bio: Yosuke Tanigawa, PhD, is a research scientist at MIT’s Computer Science and Artificial Intelligence Lab. To incorporate interindividual differences in disease prevention and treatment, he develops computational and statistical methods, focusing on predictive modeling with high-dimensional human genetics data, multi-omic dissection of disease heterogeneity, and therapeutic target discovery. His recent works focus on inclusive training strategies for genetic prediction algorithms and dissecting the molecular, cellular, and genetic basis of phenotypic heterogeneity in Alzheimer’s disease. He received many awards, including the Charles J. Epstein Trainee Awards for Excellence in Human Genetics Research and MIT Technology Review’s Innovators Under 35 Japan.

November 18

Dependence Induced Representation Learning

Xiangxiang Xu
EECS/RLE, MIT

Part Of

Add to Calendar 2024-11-18 16:00:00 2024-11-18 17:00:00 America/New_York Dependence Induced Representation Learning Abstract: Despite the vast progress in deep learning practice, theoretical understandings of learned feature representations remain limited. In this talk, we discuss three fundamental questions from a unified statistical perspective:(1) What representations carry useful information?(2) How are representations learned from distinct algorithms related?(3) Can we separate representation learning from solving specific tasks?We formalize representations that extract statistical dependence from data, termed dependence-induced representations. We prove that representations are dependence-induced if and only if they can be learned from specific features defined by Hirschfeld–Gebelein–Rényi (HGR) maximal correlation. This separation theorem signifies the key role of HGR features in representation learning and enables a modular design of learning algorithms. Specifically, we demonstrate the optimality of HGR features in simultaneously achieving different design objectives, including minimal sufficiency (Tishby's information bottleneck), information maximization, enforcing uncorrelated features (VICReg), and encoding information at various granularities (Matryoshka representation learning). We further illustrate that by adapting HGR features, we can obtain representations learned by distinct practices, from cross-entropy or hinge loss minimization, non-negative feature learning, and neural density ratio estimators to their regularized variants. We also discuss the applications of our analyses in interpreting learning phenomena such as neural collapse, understanding existing self-supervised learning practices, and obtaining more flexible designs, e.g., inference-time hyperparameter tuning.Bio: Xiangxiang Xu received the B.Eng. and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China, in 2014 and 2020, respectively. He is a postdoctoral associate in the Department of EECS at MIT. His research focuses on information theory, statistical learning, representation learning, and their applications in understanding and developing learning algorithms. He is a recipient of the 2016 IEEE PES Student Prize Paper Award in Honor of T. Burke Hayes and the 2024 ITA (Information Theory and Applications) Workshop Sand Award.

November 13

October 28

Add to Calendar 2024-10-28 16:00:00 2024-10-28 17:00:00 America/New_York Generative Models for Biomolecular Prediction, Dynamics, and Design Abstract: We lay out the three avenues in which we think generative models are especially valuable for modeling biomolecules. 1) Hard prediction tasks can be better addressed with generative models that can suggest and rank multiple solutions (e.g. docking). 2) The dynamics and conformations of biomolecules can be captured with generative models (e.g. protein conformational ensembles and MD trajectories). 3) Designing new biomolecules can be accelerated, informed by samples or likelihoods from generative models (e.g. protein binder or regulatory DNA design). 32-G882 (Hewlett)

October 21

Add to Calendar 2024-10-21 16:00:00 2024-10-21 17:00:00 America/New_York Objective Approaches in a Subjective Medical World Abstract: In today’s healthcare system, patients often feel disconnected from clinical professionals and their care journey. They receive a “one-size-fits-all” plan and are left out of the decision-making process, which can lead to a less satisfying experience. My research focuses on applying advanced AI technologies, including large language models, machine learning, and IoT, to address challenges in healthcare, particularly in patient-centered healthcare delivery. I aim to enhance the accuracy and efficiency of healthcare systems by using these "objective approaches" to navigate the subjective aspects of medical practice, such as clinician notes and patient preferences found in electronic health records. A key aspect of my work is improving the transparency of AI-based healthcare applications, making them more understandable and trustworthy for both clinicians and patients, by addressing critical issues such as building trust in AI systems and ensuring these technologies effectively meet the needs of patients and healthcare providers. Additionally, I emphasize the importance of personalizing healthcare by considering each patient's unique circumstances, including their preferences and socio-economic conditions. This research applies AI across various areas, from specific diseases like cancer to broader healthcare contexts, with the goal of improving both the delivery and experience of healthcare. My work contributes to the development of AI tools that not only enhance clinical decision-making but also foster better human-AI interaction, ultimately leading to improved healthcare outcomes. 32-G882

October 16

October 07

Add to Calendar 2024-10-07 16:00:00 2024-10-07 16:30:00 America/New_York Contextualizing Self-Supervised Learning: A New Path Ahead Abstract: Self-supervised learning (SSL) has achieved remarkable progress over the years, particularly in visual domains. However, recent advancements have plateaued due to performance bottlenecks, and more focus has shifted towards generative models. In this talk, we step back to analyze existing SSL paradigms and identify the lack of context as their most critical obstacle. To address this, we explore two approaches that incorporate contextual knowledge into SSL: 1. Contextual Self-Supervised Learning: Here, learned representations adapt their inductive biases to diverse contexts, enhancing the flexibility and generality of SSL. 2. Self-Correction: This method allows foundation models to refine themselves by reflecting on their own predictions within a dynamically evolving context.These insights illustrate new paths to craft self-supervision and highlight context as a key ingredient for building general-purpose SSL.Paper Links: * In-Context Symmetries: Self-Supervised Learning through Contextual World Models (https://arxiv.org/pdf/2405.18193) * A Theoretical Understanding of Self-Correction through In-context Alignment (https://arxiv.org/pdf/2405.18634)Both papers to be covered in this talk were accepted to NeurIPS 2024. The theoretical work on understanding self-correction received the Spotlight Award at the ICML 2024 ICL Workshop.Bio: Yifei Wang is a postdoc at CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. Yifei is generally interested in machine learning and representation learning, with a focus on bridging the theory and practice of self-supervised learning. His first-author works have been recognized by multiple best paper awards, including the Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Spotlight Award at the ICML 2024 ICL Workshop. 32-G882 (Hewlett Room)

September 23

September 16

Add to Calendar 2024-09-16 16:00:00 2024-09-16 16:30:00 America/New_York Multi-sensory perception from top to down Abstract: Human sensory experiences, such as vision, hearing, touch, and smell, serve as natural interfaces for perceiving and reasoning about the world around us. Understanding 3D environments is crucial for applications like video processing, robotics, and augmented reality. This work explores how material properties and microgeometry can be learned through cross-modal associations between sight, sound, and touch. I will introduce a method that leverages in-the-wild online videos to study interactable audio generation via dense visual cues. Additionally, I will share recent advancements in multimodal scene understanding and discuss future directions for the field.Bio: Anna is a senior undergraduate in Tsinghua University. Her previous research lies in multi-modal perception, from the perspective of audio and vision. She is an intern in Jim Glass's group. 32-G882, Hewlett Room

May 02

Add to Calendar 2024-05-02 16:00:00 2024-05-02 16:30:00 America/New_York Decomposing Predictions by Modeling Model Computation Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. Paper: https://arxiv.org/abs/2404.11534Blog post: https://gradientscience.org/modelcomponents/Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations. Room 32-G449 (Patil/Kiva)

April 25

Add to Calendar 2024-04-25 16:00:00 2024-04-25 16:30:00 America/New_York ML-Tea: Ablation Based Counterfactuals Abstract: The widespread adoption of diffusion models for creative uses such as image, video, and audio synthesis has raised serious questions surrounding the use of training data and its regulation. To arrive at a resolution, it is important to understand how such models are influenced by their training data. Due to the complexity involved in training and sampling from these models, the ultimate impact of the training data is challenging to characterize, confounding regulatory and scientific efforts. In this work we explore the idea of an Ablation Based Counterfactual, which allows us to compute counterfactual scenarios where training data is missing by ablating parts of a model, circumventing the need to retrain. This enables important downstream tasks such as data attribution, and brings us closer to understanding the influence of training data on these models. 32-370

April 18

Add to Calendar 2024-04-18 16:00:00 2024-04-18 16:30:00 America/New_York Improving data efficiency and accessibility for general robotic manipulation Abstract: How can data-driven approaches endow robots with diverse manipulative skills and robust performance in unstructured environments? Despite recent progress, many open questions remain in this area, such as: (1) How can we define and model the data distribution for robotic systems? (2) In light of data scarcity, what strategies can algorithms employ to enhance performance? (3) What is the best way to scale up robotic data collection? In this talk, Hao-Shu Fang will share his research on enhancing the efficiency of robot learning algorithms and democratizing access to large-scale robotic manipulation data. He will also discuss several open questions in data-driven robotic manipulation, offering insights to the challenges posed.Bio: Hao-Shu Fang is a postdoctoral researcher collaborating with Pulkit Agrawal and Edward Adelson. His research focuses on general robotic manipulation. Recently, he has been investigating how to integrate visual-tactile perception for improved manipulation and how to train a multi-task robotic foundation behavioral model. Room 32-370

April 11

Add to Calendar 2024-04-11 16:00:00 2024-04-11 16:30:00 America/New_York Removing Biases from Molecular Representations via Information Maximization Abstract: High-throughput drug screening – using cell imaging or gene expression measurements as readouts of drug effect – is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE’s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.Bio: I am a second-year PhD student at MIT EECS, advised by Tommi Jaakkola and Caroline Uhler. I am also affiliated with Eric and Wendy Schmidt Center (EWSC) at Broad Institute. My research interests lie broadly in machine learning, representation learning, and AI for science. Recently my research focuses on multi-modal representation learning and perturbation modelling for drug discovery. Before my PhD, I obtained my Bachelor’s degree from Tsinghua University. Room 32-370

April 04

Add to Calendar 2024-04-04 16:00:00 2024-04-04 16:30:00 America/New_York Interpolating Item and User Fairness in Multi-Sided Recommendations Abstract: Today's online platforms rely heavily on algorithmic recommendations to bolster user engagement and drive revenue. However, such algorithmic recommendations can impact diverse stakeholders involved, namely the platform, items (seller), and users (customers), each with their unique objectives. In such multi-sided platforms, finding an appropriate middle ground becomes a complex operational challenge. Motivated by this, we formulate a novel fair recommendation framework, called Problem (FAIR), that not only maximizes the platform's revenue, but also accommodates varying fairness considerations from the perspectives of items and users. Our framework's distinguishing trait lies in its flexibility - it allows the platform to specify any definitions of item/user fairness that are deemed appropriate, as well as decide the "price of fairness" it is willing to pay to ensure fairness for other stakeholders. We further examine Problem (FAIR) in a dynamic online setting, where the platform needs to learn user data and generate fair recommendations simultaneously in real time, which are two tasks that are often at odds. In face of this additional challenge, we devise a low-regret online recommendation algorithm, called FORM, that effectively balances the act of learning and performing fair recommendation. Our theoretical analysis confirms that FORM proficiently maintains the platform's revenue, while ensuring desired levels of fairness for both items and users. Finally, we demonstrate the efficacy of our framework and method via several case studies on real-world data.Bio: Qinyi Chen is a fourth-year PhD student in the Operations Research Center (ORC) at MIT, advised by Prof. Negin Golrezaei. Her research interests span machine learning and optimization, AI/ML fairness, approximation algorithms, game and auction theory, with applications in digital platforms and marketplaces. 32-370

March 21

Add to Calendar 2024-03-21 16:00:00 2024-03-21 16:30:00 America/New_York When is Agnostic Reinforcement Learning Statistically Tractable? Abstract: We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Π, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an ε-suboptimal policy with respect to Π? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Π and is independent of the MDP dynamics. With a generative model, we show that for any policy class Π, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class Π with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration. 32-370