Abstract: Training large language models has become a defining pursuit in modern machine learning—one that is almost entirely led by industry, fueled by massive computational resources and guided by scaling laws that reward ever-larger models and datasets. For academic researchers, participating in this space can feel out of reach. The barriers—limited compute, infrastructure, and access to proprietary data—are real and growing. Still, I believe academia has an essential role to play. Even with constraints, there are important scientific questions and meaningful opportunities that academic research is uniquely positioned to tackle. By engaging with the training process itself, we can deepen our understanding of language models and develop novel and efficient approaches that complement large-scale efforts. In this talk, I’ll share my lab’s research efforts over the past two years in both pre-training and post-training of language models under an academic budget. Our work has aimed to better understand training dynamics, innovate within limitations, and release artifacts that benefit the broader research community. I’ll also highlight three areas where academic researchers can make significant contributions: (1) developing small but capable models, (2) understanding and improving training data, and (3) advancing post-training methods on top of open-weight models. My hope is to encourage broader engagement with LM training in academia, and to foster new forms of collaboration between academic and industry research.
Bio: Danqi Chen is an Associate Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. She also serves as an Associate Director of Princeton Language and Intelligence (PLI), an initiative focused on developing fundamental research of large AI models. Her recent research centers on training, adapting, and understanding language models (LMs), with an emphasis on making them more accessible to academia. Before joining Princeton, Danqi was a visiting scientist at Facebook AI Research in Seattle. She earned her Ph.D. from Stanford University (2018) and her B.E. from Tsinghua University (2012), both in Computer Science. Her work has been recognized with a Sloan Fellowship, an NSF CAREER Award, a Samsung AI Researcher of the Year Award, and multiple outstanding paper awards from ACL and EMNLP.
Dawn Song (University of California, Berkeley)

Title: Towards Building Safe and Secure AI: Lessons and Open Challenges
Abstract: TBA
Bio: Dawn Song is a Professor in Computer Science at UC Berkeley and Co-Director of Berkeley Center for Responsible Decentralized Intelligence. Her research interest lies in AI and deep learning, security and privacy, and decentralization technology. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, ACM SIGSAC Outstanding Innovation Award, and more than 10 Test-of-Time Awards and Best Paper Awards from top conferences in Computer Security and Deep Learning. She has been recognized as Most Influential Scholar (AMiner Award), for being the most cited scholar in computer security. She is an ACM Fellow and an IEEE Fellow. She obtained her Ph.D. degree from UC Berkeley. She is also a serial entrepreneur and has been named on the Female Founder 100 List by Inc. and Wired25 List of Innovators.
Song-Chun Zhu (Peking University and Tsinghua University)

Title: AGI: Framework, Prototype, Definition and Benchmark
Abstract: In this talk, I will present a set of work done at the Beijing Institute of General Artificial Intelligence (BIGAI) and Peking University on AGI, which is also called TongAI as the Chinese character ‘Tong’ means ‘general’ and contains the letters ‘A’, ‘G’ and ‘I’. I will start with introducing a digital agent — a little girl who lives and learns continuously in simulated diverse physics-realistic environments with multi-physics and social interactions. The little girl with a nickname ‘TongTong’ is driven by her own value system with desires and goals which generates plans and actions. Then I will reveal the framework underneath this self-conscious agent with three interconnected components: Cognitive architecture (C), the potential functions (U) representing skills, and the value functions (V). Then we define various AGI systems as points in this joint (C,U,V) –space. This framework represents a paradigm shift from the popular “data-driven” “large data for small tasks” statistical paradigm which we pioneered at Harvard, Brown and UCLA since the early 1990s, to the “value-driven” “small data for large tasks” paradigm which I have been advocating since 2010. Then I will introduce TongTest as new criteria, benchmarks and test platform for measuring the general intelligence of various AI agents on performing multi-modal embodied tasks in complex environments. The TongTest has gone way beyond the Turing test in complexity and integrates results from developmental psychology and anthropology. It assesses the intelligence of TongTong to match a 3-4 years old child. In the talk, I will also show some recent work on humanoid robotics and applications, and discuss the Eastern philosophical thinking what makes humans and intelligence, and how morality and social norm emerge from the CUV framework as a solution to AGI safety.
Bio: Song-Chun Zhu is currently director of Beijing Institute for General Artificial Intelligence (BIGAI) – a non-profit research organization, and Chair Professor at Peking University and Tsinghua University jointly. He is also dean of both School of Intelligence Science and Technology and Institute for Artificial Intelligence at Peking University. He received Ph.D. degree in computer science from Harvard University in 1996, and after that he worked at Brown, Stanford and UCLA where he established an inter-disciplinary center of Vision, Cognition, Learning and Autonomy (VCLA@UCLA 2002-2020) before he returned to China in late 2020. He has published 400+ articles in AI areas including CV, Cognition, Robotics, ML, NLP, MAS et al. His scientific contributions have garnered various recognitions, including the Marr Prize (2003), twice Marr prize honorary nominations (1999,2007), Sloan Fellowship (2001), J.K. Aggarwal Prize (2008), Helmholtz Test-of-Time Award (2013) , computational modeling prize at CogSci (2017) etc. He established the Lotus Hill Institute in 2005 to launch large-scale image annotation and pioneered data-driven statistical approaches. He served as general chair for CVPR 2012 and 2019., and led twice the Multi-University Research Initiatives (MURI 2010-2015, 2015-2020) on scene understanding, visual commonsense reasoning and robot autonomy in the US. His research has been pursuing a general unified theory of intelligence.
Tim Rocktäschel (Google DeepMind and University College London)

Title: Open-Endedness, World Models, and the Automation of Innovation
Abstract: The pursuit of Artificial Superintelligence (ASI) requires a shift from narrow objective optimization towards embracing Open-Endedness—a research paradigm, pioneered in AI by Stanley, Lehman and Clune, that is focused on systems that generate endless sequences of novel but learnable artifacts. In this talk, I will present our work on large-scale foundation world models that can generate a wide variety of diverse environments that can in turn be used to train more general and robust agents. Furthermore, I will argue that the connection between Open-Endedness and Foundation Models points towards automating innovation itself. This convergence is already yielding practical results, enabling self-referential self-improvement loops for automated prompt engineering, automated red-teaming, and AI debate in Large Language Models, and it hints at a future where AI drives its own discoveries.
Bio: Tim Rocktäschel is the Director, Principal Scientist, and the Open-Endedness Team Lead at Google DeepMind. He is also a Professor of Artificial Intelligence at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), where he is the Principal Investigator of the UCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab. He is also a Fellow of the European Laboratory for Learning and Intelligent Systems (ELLIS). Previously, he served as a Manager, Research Scientist, and Area Lead at Meta AI (FAIR), a Postdoctoral Researcher in Reinforcement Learning at the Whiteson Research Lab at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. He obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, receiving a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017. His work focuses on Artificial General Intelligence, Open-Endedness, and Self-Improvement, and has received Best Paper Awards at ICML.
Yi Ma (Hong Kong University)

Title: Pursuing the Nature of Intelligence
Abstract: In this talk, we will try to clarify different levels and mechanisms of intelligence from historical, scientific, mathematical, and computational perspective. From the evolution of intelligence in nature, from phylogenetic, to ontogenetic, to societal, and to artificial intelligence, we will try to shed light on how to understand the true nature of the seemingly dramatic advancements in the technologies of machine intelligence in the past decade. We achieve this goal by developing a principled mathematical framework to explain the practice of deep learning from the perspective of compressive data encoding and decoding. This framework not only reveals true nature hence limitations of the current practice and but also provides principled guidelines to develop more complete and more efficient learning architectures and systems. Eventually, we will clarify the difference and relationship between Knowledge and Intelligence, which may guide us to pursue the goal of developing systems with true intelligence.
Bio: Yi Ma is a Chair Professor in Artificial Intelligence, the inaugural director of the School of Computing and Data Science and the Institute of Data Science of the University of Hong Kong since 2023. His research interests include computer vision, high-dimensional data analysis, and integrated intelligent systems. Yi received his two bachelor’s degrees in Automation and Applied Mathematics from Tsinghua University in 1995, two master’s degrees in EECS and Mathematics in 1997, and a PhD degree in EECS from UC Berkeley in 2000. He served on the faculty of UIUC ECE from 2000 to 2011, the principal researcher and manager of the Visual Computing group of Microsoft Research Asia from 2009 to 2014, and the Executive Dean of the School of Information Science and Technology of ShanghaiTech University from 2014 to 2017. He was on the faculty of UC Berkeley EECS Department from 2018-2023, where he continues to be a visiting professor. He has published over 65 journal papers, 150 conference papers, and three textbooks on 3D vision, generalized PCA, and high-dimensional data analysis. He received the NSF Career award in 2004 and the ONR Young Investigator award in 2005. He also received the David Marr prize in computer vision from ICCV 1999 and best paper awards from ECCV 2004 and ACCV 2009. He has served as the Program Chair for ICCV 2013 and the General Chair for ICCV 2015. He is a Fellow of IEEE, ACM, and SIAM.
Zico Kolter (Carnegie Mellon University)

Title: Building Safe and Robust AI Systems
Abstract: As AI systems become more powerful, it is increasingly important that developers be able to strictly enforce desired policies for the systems. Unfortunately, via techniques such as adversarial attacks, it has traditionally been possible to circumvent model policies, allowing bad actors to manipulate LLMs for unintended and potentially harmful purposes. In this talk, I will highlight several recent directions of work that are making progress in addressing these challenges, including methods for robustness to jailbreaks, safety pre-training, and methods for preventing undesirable model distillation. I will additionally highlight some of the areas I believe to be most crucial for future work in the field.
Bio: Zico Kolter is a Professor and Department Head of the Machine Learning Department at Carnegie Mellon University. Additionally, he serves on the Board of Directors at OpenAI, where he chairs the safety and security committee, is a co-founder and the Chief Technical Advisor of Gray Swan AI, an AI Security company, and is a Chief Expert at Robert Bosch, LLC. His work spans several topics in machine learning, including work in AI safety and robustness, LLM security, the impact of data on models, implicit models, and more. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), AISTATS (test of time), IJCAI, KDD, and PESGM.