ML-Tea: Ablation Based Counterfactuals
Speaker
Zheng Dai
MIT CSAIL
Host
Behrooz Tahmasebi
MIT CSAIL
Abstract: The widespread adoption of diffusion models for creative uses such as image, video, and audio synthesis has raised serious questions surrounding the use of training data and its regulation. To arrive at a resolution, it is important to understand how such models are influenced by their training data. Due to the complexity involved in training and sampling from these models, the ultimate impact of the training data is challenging to characterize, confounding regulatory and scientific efforts. In this work we explore the idea of an Ablation Based Counterfactual, which allows us to compute counterfactual scenarios where training data is missing by ablating parts of a model, circumventing the need to retrain. This enables important downstream tasks such as data attribution, and brings us closer to understanding the influence of training data on these models.