The Promises and Pitfalls of Open-source Agent Systems
Speaker
Tim Dettmers
Carnegie Mellon University / Allen Institute for AI
Host
Shannon Shen
MIT CSAIL
Abstract:
Agent systems, AI systems that make their own plans and act on them, have shown promising results particularly for coding-changes such as SWE-bench. However, currently, most agent systems rely on closed-source API models such as GPT-4o and Claude as it is believed that open-source models do not have the capabilities to make up successful agent systems. In this talk, I show that agent systems powered by open-source models can match the performance of systems based on GPT-4o. This implies that for good task performance how you use a model is much more important than what model you use. I also discuss problems with agent system generalization and high variability in evaluation that shows we need to be cautious when making scientific claims about agent systems. I will argue that we will need to focus on these generalization and evaluation challenges to make steady scientific progress.
Bio:
Tim Dettmers is a Research Scientist at the Allen Institute for AI and an Assistant Professor at Carnegie Mellon University. His work focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. His main focus is to develop high-quality agent systems that are open-source and can be run on consumer hardware, such as laptops. His research won oral, spotlight, and best paper awards at conferences such as ICLR and NeurIPS and was awarded the Block Award and Madrona Prize. He created the bitsandbytes open-source library for efficient foundation models, which is growing at 2.2 million installations per month, and for which he received Google Open Source and PyTorch Foundation awards.
Agent systems, AI systems that make their own plans and act on them, have shown promising results particularly for coding-changes such as SWE-bench. However, currently, most agent systems rely on closed-source API models such as GPT-4o and Claude as it is believed that open-source models do not have the capabilities to make up successful agent systems. In this talk, I show that agent systems powered by open-source models can match the performance of systems based on GPT-4o. This implies that for good task performance how you use a model is much more important than what model you use. I also discuss problems with agent system generalization and high variability in evaluation that shows we need to be cautious when making scientific claims about agent systems. I will argue that we will need to focus on these generalization and evaluation challenges to make steady scientific progress.
Bio:
Tim Dettmers is a Research Scientist at the Allen Institute for AI and an Assistant Professor at Carnegie Mellon University. His work focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. His main focus is to develop high-quality agent systems that are open-source and can be run on consumer hardware, such as laptops. His research won oral, spotlight, and best paper awards at conferences such as ICLR and NeurIPS and was awarded the Block Award and Madrona Prize. He created the bitsandbytes open-source library for efficient foundation models, which is growing at 2.2 million installations per month, and for which he received Google Open Source and PyTorch Foundation awards.