Back to Events

Seminar Series

March 21

Add to Calendar 2024-03-21 16:00:00 2024-03-21 16:30:00 America/New_York When is Agnostic Reinforcement Learning Statistically Tractable? Abstract: We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Π, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an ε-suboptimal policy with respect to Π? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Π and is independent of the MDP dynamics. With a generative model, we show that for any policy class Π, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class Π with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration. 32-370

March 14

Add to Calendar 2024-03-14 16:00:00 2024-03-14 16:30:00 America/New_York What's the Erdős number of an LLM? Mathematical and algorithmic discovery via machine learning Abstract: We survey methods for discovering novel mathematics and novel algorithms via machine learning (AlphaTensor, FunSearch, AlphaGeometry, AI Feynman etc.). We won't present our own work but rather other people's works. So, this is a review in form a presentation. 32-370

March 07

Add to Calendar 2024-03-07 16:00:00 2024-03-07 16:30:00 America/New_York Human Expertise in Algorithmic Prediction Abstract: We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which ‘look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.Speaker Bio: Rohan is a second year PhD student in EECS, where he is advised by Manish Raghavan and Devavrat Shah. His research interests are at the intersection of machine learning and economics, with a particular focus on causal inference, human/AI collaboration and data-driven decision making. Room 32-370 and on Zoom

February 29

Context is Environment

Sharut Gupta
MIT CSAIL

Part Of

Add to Calendar 2024-02-29 17:00:00 2024-02-29 17:30:00 America/New_York Context is Environment Abstract: Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the hard lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context--unlabeled examples as they arrive--allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.Speaker Bio: Sharut Gupta is a second-year Ph.D. student at MIT CSAIL, working with Prof. Stefanie Jegelka. Her research mainly focuses on building robust and generalizable machine learning systems under minimal supervision. She enjoys working on out-of-distribution generalization, self-supervised learning, causal inference, and representation learning. Room 32-G449 (Patil/Kiva Seminar Room)

February 22

Add to Calendar 2024-02-22 17:00:00 2024-02-22 17:30:00 America/New_York Ask Your Distribution Shift if Pre-Training is Right for You Abstract: Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training can and cannot address. In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e.g., they cannot generalize to a different domain) and biases in the training data (e.g., they rely on spurious features). Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases. After providing theoretical motivation and empirical evidence for this finding, we explore two of its implications for developing robust models: (1) pre-training and interventions designed to prevent exploiting biases have complementary robust- ness benefits, and (2) fine-tuning on a (very) small, non-diverse but de-biased dataset can result in significantly more robust models than fine-tuning on a large and diverse but biased dataset.Speaker bio: Ben is a second year PhD student at MIT where he is advised by Aleksander Madry. He is interested in how we can develop machine learning models that can be safely deployed, with a focus on robustness to distribution shifts. Lately, he has been working on understanding how we can harness large-scale pre-training (e.g., CLIP, GPT) to develop robust task-specific models.

February 15

Efficiently Searching for Distributions

Sandeep Silwal
CSAIL MIT

Part Of

Add to Calendar 2024-02-15 16:00:00 2024-02-15 16:30:00 America/New_York Efficiently Searching for Distributions Abstract: How efficiently can we search distributions? The problem is modeled as follows: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,...,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over [n]. The goal is to output the closest distribution to p among the v_i's in TV distance (up to some small additive error). State of the art sample efficient algorithms require Theta(log k) samples and run in near linear time.We introduce a fresh perspective on the problem and ask if we can output the closest distribution in *sublinear* time. This question is particularly motivated as it is a generalization of the traditional nearest neighbor search problem: if we take enough samples, we can learn p explicitly up to low TV distance, and then find the closest v_i in o(k) time using standard nearest neighbor search. However, this approach requires Omega(n) samples. Thus, it is natural to ask: can we obtain both sublinear number of samples and sublinear query time? We present some nice progress towards this question and uncover a very interesting statistical-computational trade-off.This is joint work with Anders Aamand, Alex Andoni, Justin Chen, Piotr Indyk, Shyam Narayanan, and Haike Xu.Bio: Sandeep is a final year PhD student at MIT, advised by Piotr Indyk. His interests are broadly in fast algorithm design. Recently, he has been working in the intersection of machine learning and classical algorithms by designing provable algorithms in various ML settings, such as efficient algorithms for processing large datasets, as well as using ML to inspire algorithm design.

December 08

Add to Calendar 2023-12-08 16:00:00 2023-12-08 16:30:00 America/New_York Learning to Assess Disease and Health In Your Home Abstract: The future of healthcare lies in delivering comprehensive medical services to patients in their own homes. As the global population ages and chronic diseases become increasingly prevalent, objective, longitudinal and reliable health and disease assessment at home becomes crucial for early detection and prevention of hospitalization. In this talk, I will present new learning methods with everyday devices for in-home healthcare. I will first describe a simple self-supervised framework for remote human vitals sensing just using daily smartphones. I will then introduce an AI-powered digital biomarker for Parkinson’s disease that detects the disease, estimates its severity, and tracks its progression using nocturnal breathing signals. They showcase the potential of AI-based in-home assessment for various diseases and human health sensing, enabling remote monitoring of health-related conditions, timely care and enhancing patient outcomes.Speaker bio: Yuzhe Yang is a PhD candidate in computer science at MIT. He received his B.S. with honors in EECS from Peking University. His research interests include machine learning, and AI for human disease, health and medicine. His works on AI-enabled biomarkers for Parkinson’s disease were named as Ten Notable Advances in 2022 by Nature Medicine, and Ten Crucial Advances in Movement Disorders in 2022 by The Lancet Neurology. His research has been published in Nature Medicine, Science Translational Medicine, NeurIPS, ICML, ICLR, CVPR, and UbiComp. His works have been recognized by the MathWorks Fellowship, Takeda Fellowship, Baidu PhD Scholarship, and media coverage from MIT Tech Review, Wall Street Journal, Forbes, BBC, The Washington Post, etc. Room 32-G882 (Hewlett Room)

December 01

Add to Calendar 2023-12-01 16:00:00 2023-12-01 16:30:00 America/New_York Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering Abstract: We investigate the camera pose estimation problem in the context of 2D/3D medical image registration. The application is to align 2D intraoperative images (e.g., X-ray) to a patient's 3D preoperative volume (e.g., CT), helping provide 3D image guidance during minimally invasive surgeries. We present a patient-specific self-supervised approach that uses differentiable rendering to achieve the sub-millimeter accuracy required in this context. Some of aspects of our work that may be of interest to the broader ML community include- How do you exactly compute the rendering equation for differentiable ray tracing through a voxel grid?- What is the optimal representation of rotations and translations when using gradient descent to optimize poses?- What is the optimal image loss function that achieves robust image registration while still being fast enough to use in real time?Speaker bio: Vivek is a 3rd year PhD student in Polina Golland's group broadly interested in 3D computer vision problems across science and medicine. 32-G882 (Hewlett Room)

November 17

Add to Calendar 2023-11-17 16:00:00 2023-11-17 16:30:00 America/New_York A Game-Theoretic Perspective on Trustworthy Algorithms Abstract: Many algorithms are trained on data provided by humans, such as those that power recommender systems and hiring decision aids. Most data-driven algorithms assume that user behavior is exogenous: a user would react a given prompt (e.g., a recommendation or hiring suggestion) in the same way no matter what algorithm generated it. For example, algorithms that rely on an i.i.d. assumption inherently assume exogeneity. In practice, user behavior is not exogenous---users are *strategic*. For example, there are documented cases of TikTok users changing their scrolling behavior after realizing that the TikTok algorithm pays attention to dwell time, and Uber drivers changing how they accept and cancel rides based on Uber's matching algorithm. What are the implications of breaking the exogeneity assumption? We answer this question in our work, modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We leverage results from misspecified learning to characterize the effect of strategization on data-driven algorithms. As one of our main contributions, we find that designing trustworthy algorithms can go hand in hand with accurate estimation. That is, there is not necessarily a trade-off between performance and trustworthiness. We provide a formalization of trustworthiness that inspires potential interventions. 32-G882 (Hewlett Room)

November 07

Add to Calendar 2023-11-07 17:00:00 2023-11-07 17:30:00 America/New_York The Journey, not the Destination: How Data Guides Diffusion Models Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data—that is, identifying specific training examples which caused an image to be generated—remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing such attributions efficiently by leveraging recent work on data attribution in the supervised setting. Finally, we apply our method to find (and evaluate) such attributions for diffusion models trained on CIFAR-10 and MS COCO.Speaker bio: Josh is a second year PhD student working with Aleksander Madry. Josh's research focuses on building machine learning models that are safe and robust when deployed in the real world. Room 32-G882 (Hewlett Room)

November 03

Add to Calendar 2023-11-03 16:00:00 2023-11-03 16:30:00 America/New_York Operator SVD with Neural Networks via Nested Low-Rank Approximation Abstract: Top-$L$ eigenvalue decomposition (EVD) of a given linear operator, or finding its top-$L$ eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific simulation problems.For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques.While several optimization frameworks have been proposed in this parametric approach, all the existing proposals either use an ad-hoc regularization to obtain orthogonal eigenfunctions and/or inherently suffer with biased gradient estimates.In this talk, I will present a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition (SVD), accompanied with a technique called nesting for correctly learning the top-$L$ singular- value and functions up to degeneracy. Top-$L$ EVD can be performed as a special case. The proposed optimization framework is easy to implement with off-the-shelf gradient-based optimization algorithms, since (1) it is based on an unconstrained optimization problem that naturally admits an unbiased gradient estimator, and (2) it works without any extra orthonormalization steps and regularization terms. The proposed optimization framework can be used in a variety of application scenarios, and I will briefly discuss its application in machine learning and computational physics.Speaker bio: Jongha (Jon) Ryu is a postdoctoral associate at Research Laboratory of Electronics (RLE) hosted by Prof. Gregory W. Wornell. His research in general aims to develop efficient, reliable, and robust machine learning algorithms with provable performance guarantees, especially with inspirations from information theory. He is currently interested in representation learning, generative models, and learning with uncertainty. Room 32-G882 (Hewlett Room)

October 27

Add to Calendar 2023-10-27 16:00:00 2023-10-27 16:30:00 America/New_York Semantics and Learning for Active Robot Perception in Dynamic Environments Abstract: The ability to autonomously explore and model an unknown and changing environment is a fundamental capability for robot autonomy, and a prerequisite for numerous applications in industrial, construction, household, service, and assistive robotics. This talk explores how various forms of scene understanding, ranging from traditional geometry, end-to-end learning, semantic perception, and abstraction, can enable robots to actively reconstruct an unknown environment, detect and understand dynamic entities, and leverage prediction and adaptation for improved task performance in changing scenes. The presented methods are validated running on-board fully autonomous robots and the code is released as open source.Speaker bio: Lukas Schmid is a postdoctoral fellow working with Luca Carlone at the MIT-SPARK Lab. Before that, he briefly was a postdoctoral researcher at the Autonomous Systems Lab lead by Prof. Roland Siegwart at ETH Zürich, Switzerland, where he also obtained his PhD in 2022 and M.Sc. in Robotics, Systems, and Control in 2019. Among others, his work was honored with the Willi Studer Prize for the best M.Sc. graduate, the ETH Medal for outstanding master theses, and a Swiss National Science Foundation postdoctoral fellowship. 32-G882 (Hewlett Room)

October 20

Add to Calendar 2023-10-20 16:00:00 2023-10-20 16:30:00 America/New_York Feature Geometry and Multivariate Dependence Learning Abstract: In this talk, we present a geometric framework for learning and processing information from data. First, we introduce the feature geometry, which unifies statistical dependence and features in functional space equipped with geometric structures. Then, we formulate each learning problem as solving the optimal feature representation of the associated dependence component. Specifically, we will demonstrate deep neural networks as one specific method for achieving this goal. Building on this observation, we will propose more adaptable ways to design neural networks for multivariate learning tasks. We will discuss several learning applications, including (1) handling multimodal data with missing modalities and (2) learning dependence structures from sequential data.[Based on https://arxiv.org/abs/2309.10140]Speaker bio: Xiangxiang Xu is a postdoctoral associate in the Department of EECS at MIT, hosted by Prof. Lizhong Zheng. His research focuses on information theory and statistical learning, with applications in understanding and developing learning algorithms. 32-370

October 13

Add to Calendar 2023-10-13 16:00:00 2023-10-13 16:30:00 America/New_York The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms Abstract: The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade, the analysis of optimistic algorithms, including Optimistic Least Squares (OLS), in the context of rich reward function classes has relied on the concept of eluder dimension, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like OLS by establishing a link between regret and the dissimilarity dimension. To illustrate this, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime, advancing our understanding of these concepts.Speaker bio: Aldo Pacchiano is a postdoctoral researcher Fellow at the Eric and Wendy Schmidt Center of the broad institute of MIT and Harvard. He obtained his PhD under the supervision of Profs. Michael Jordan and Peter Bartlett at UC Berkeley and was a Postdoctoral Researcher at Microsoft Research, NYC. He will join the Boston University Center for Computing and Data Sciences as an assistant professor in the summer of 2024. His research lies in the areas of Reinforcement Learning, Online Learning, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific, engineering, and large-scale societal applications. 32-370

October 06

Add to Calendar 2023-10-06 16:00:00 2023-10-06 16:30:00 America/New_York On counterfactual inference with unobserved confounding Abstract: Given an observational study with n independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one p-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning n exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all n samples to jointly learn all n parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are s-sparse linear combination of k known vectors, the error is O(s log k/p). En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing unobserved confounders.Speaker bio: Abhin Shah is a sixth-year Ph.D. student advised by Prof. Devavrat Shah and Prof. Greg Wornell. He is a recipient of MIT’s Jacobs Presidential Fellowship. His research interests include theoretical and applied aspects of trustworthy machine learning with a focus on causality and fairness. Room 32-370

September 29

Add to Calendar 2023-09-29 15:00:00 2023-09-29 15:30:00 America/New_York Linear Attention Is (maybe) All You Need (to understand Transformer optimization) Abstract: Transformer training is notoriously difficult; requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that the linearized models mimic several prominent aspects of transformers vis-a-vis their training dynamics. Consequently, the results of this paper hold the promise of identifying a simple transformer model that might be a valuable, realistic proxy for understanding transformers.Speaker bio: Kwangjun Ahn is a final year PhD student at MIT with the Department of EECS (Electrical Engineering & Computer Science) and Laboratory for Information and Decision Systems (LIDS). His advisors are Profs. Suvrit Sra and Ali Jadbabaie. He's also working part time at Google Research, where he's working on accelerating LLM inference with the Speech & Language Algorithms Team. His current research interests include understanding LLM optimization and how to speed up the optimization. He has worked on various topics over the years, including machine learning theory, optimization, statistics, and learning for control. Room 32-882(Hewlett)

September 22

Add to Calendar 2023-09-22 16:00:00 2023-09-22 16:30:00 America/New_York Machine learning of model errors in dynamical systems The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. Here, we present a unifying framework for blending mechanistic and machine-learning approaches for identifying dynamical systems from data. This framework is agnostic to the chosen machine learning model parameterization, and casts the problem in both continuous- and discrete-time. We will also show recent developments that allow these methods to learn from noisy, partial observations. We first study model error from the learning theory perspective, defining the excess risk and generalization error. For a linear model of the error used to learn about ergodic dynamical systems, both excess risk and generalization error are bounded by terms that diminish with the square-root of T (the length of the training trajectory data). In our numerical examples, we first study an idealized, fully-observed Lorenz system with model error, and demonstrate that hybrid methods substantially outperform solely data-driven and solely mechanistic-approaches. Then, we present recent results for modeling partially observed Lorenz dynamics that leverages both data assimilation and neural differential equations. Joint work with Andrew Stuart. Room 32-370

September 15

Add to Calendar 2023-09-15 16:30:00 2023-09-15 17:00:00 America/New_York Large-Scale Study of Temporal Shift in Health Insurance Claims Abstract: Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.Bio: Christina Ji is a 5th year PhD student in the clinical ML group advised by David Sontag. Her research is on detecting and addressing distribution shift over time in healthcare settings. She has also worked on characterizing variation in treatment policies with causal inference methods and evaluating reinforcement learning policies. 32-G882