Yisong Yue

About Yisong Yue

Posts by Yisong Yue:

December 17 2025

ICLR’s Commitment to OpenReview

Yisong Yue ICLR 2026

The International Conference on Learning Representations (ICLR) strongly supports the mission of OpenReview to promote openness in scientific communication. OpenReview has been instrumental in enabling open peer review and community engagement, which aligns with ICLR’s commitment to advancing reproducible and inclusive research practices.

Earlier this year ICLR committed to doubling its regular contribution to OpenReview. In recognition of its impact, ICLR will continue to review increasing our contributions at regular intervals. This support ensures the sustainability of the platform and its ability to serve the global research community. We encourage other conferences and organizations to join us in strengthening this vital infrastructure for open science.

April 23 2025

ICLR 2025 Mentoring Chats

Yisong Yue ICLR 2025

Part of the ICLR experience is meeting people and talking with them about their research interests and experiences. To facilitate these conversations, we are thrilled to announce the third iteration of Mentoring Chats at ICLR (previously called Office Hours).

The Mentoring Chats will be 45-minute round-table sessions, held during lunch (12:30-1:15 pm and 1:15-2:00 pm) in the Topaz Concourse every day of the main conference (April 24-26). A mentor will lead each session, and participants can bring forward relevant topics they’d like to discuss or simply engage in conversation with questions about job experience, research challenges, or general advice. There will be a bell ring approximately 22 minutes in, urging participants to switch tables, or switch topics while staying at the same table.

List of mentors: Aditi Ragunathan, Amy Zhang, Bo Han, Claire Vernade, Danqi Chen, David Abel, Erin Grant, Fei Liu, Furong Huang, Huazhe Xu, Junxian He, Kyunghyun Cho, Masashi Sugiyama, Nouha Dziri, Rene Vidal, Samy Bengio, Tatsunori Hashimoto, Taylor W. Killian, Xuezhi Wang

The detailed schedule is available here. Following ICLR 2024, we have a list of topics and questions that you may wish to ask mentors.

We hope to see you at the Mentorship Hours!

Research agenda

Where should I start if I want to do research in ML? What kind of mathematical/programming skills are required for ML research?
What are good courses to take? How should I use different modes of learning, such as classroom courses, video lectures, and reading a book?
How to keep track of all the research literature? How to balance breadth vs depth?
What are some broader goals of academic machine learning research in the era of LLMs?
How can one set themselves apart in this crowded research space?
What is ethical research?
How to decide on a research area? How to decide on a research project?
How to adapt my research according to the current trends/community interests?
How to cope with the pressure of publishing while working on riskier/harder projects? Should I be worried about other groups scooping my research and how to deal with such situations?
Should I establish myself as an expert in one area/technique or explore a breadth of topics? Should I master a technique and apply it to different problems, or should I master a subfield by finding all useful techniques (hammer vs nails)?

ML+X: Multidisciplinary research

What are good strategies for starting an interdisciplinary project?
When working across disciplines, should I have one of them as my “home” community or try to be equally visible in both?
What are the most efficient ways to help establish my ML+X area as a more active area? Should I organize workshops, teach tutorials, ..?
How to deal with different incentive structures in interdisciplinary collaborations (e.g., journals vs conferences)?

Advisor and collaborators

Should I follow my advisor’s agenda or define my own?
What are the pros and cons of being co-advised?
When is it appropriate to change advisors and how to go about it?
How to navigate conflicts with an advisor?
How to get a good balance between collaborating with other researchers while also distinguishing my own research? Will too much collaboration hurt my job prospects?
What to look for in a collaborator?
How do I convey the level of commitment I am willing to have in a project without it being awkward? How to say no to collaborations?
How to navigate different conventions wrt author ordering? Alphabetical vs contributional ordering? Should my advisor always be a coauthor because they are funding me?
What do I do if my collaborator is not responsive?

Communicating research and networking

How to find mentors and allies beyond my advisor?
What is the best way to communicate my research? Blogs, videos, presentations?
How to write a good research statement? How to apply for fellowships?
Should I present my work in poster sessions and workshops? Should I be scared of getting scooped? What are the pros of presenting my work early?

Beyond your institution: Internships and research visits

Should I do a research internship on a topic different from my dissertation?
Does it make sense to do a software engineering/development internship if it is not research-related?
When is a good time to look for internships? Should I apply online or email people?
Should I do research visits to other universities? Does it make sense to go to semester-long programs as a junior student?
How to get the most out of my internship? What should be the main goal of doing an internship?

Planning after grad school: academia vs industry

What should I consider when planning for the next step? How should I decide whether to go to academia or industry?
How to select a postdoc advisor?
Should I apply to different departments than my core department? How can I prepare for that, and how early?
Is it ok to quit your PhD? How can I plan my next steps if so?

Work ethics, open research discussion, personal challenges

How to balance work-life? How much work is too much work?
How to take care of mental and physical health?
How to learn about the ethical implications around the topics of my research?
How to foster inclusion in research and teaching?

May 6 2024

ICLR 2024 Outstanding Paper Awards

Yisong Yue ICLR 2024

Awards Committee: Eunsol Choi, Katja Hofmann, Ming-Yu Liu, Nan Jiang, Stephan Günnemann, Suvrit Sra, Thomas Kipf, Volkan Cevher

(This post is written by the Awards Committee, lightly edited by the Program Chairs.)

Selection Process

The ICLR 2024 Outstanding Paper Committee went through the following selection process to identify a collection of outstanding papers and honorable mentions that showcase excellent research presented at this conference.

The committee began with an initial pool of 44 papers provided by the program chairs. During phase 1, each member of the committee ranked these papers based on how comfortable they would be reviewing the papers based on their expertise, while avoiding conflicts of interest. Then, each paper was assigned two committee members, and each in turn received about a dozen papers to review. Then, each member was encouraged to submit up to three nominations in their batch through an anonymous form. This process generated ~20 shortlisted papers. During Phase 2, all members of the committee familiarized themselves with those papers. The second reviewer (who did not submit a nomination for the paper) was also encouraged to share their thoughts. During the final phase, the committee discussed the nominated papers together and decided on outstanding paper / honorable mentions. The committee also aimed to highlight a wide range of research contributions, from theoretical insights, practical impacts, exceptional writing, and experimental rigor. Throughout the process, the committee reached out to external experts when appropriate and would like to thank all those who contributed to this process.

In total there are 5 Outstanding Paper winners, and 11 Honorable Mentions. Congratulations to all the authors for their exceptional contributions to ICLR!

Award Winners

Generalization in diffusion models arises from geometry-adaptive harmonic representations
Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat
https://openreview.net/forum?id=ANvmVS2Yr0
https://iclr.cc/virtual/2024/oral/19783

This paper provides an important in-depth analysis on generalization and memorization aspects of image diffusion models. The authors empirically study when an image generative model switches from memorizing the input to a generalization regime and they further provide an explanation of this phenomenon in terms of architectural inductive biases by making a connection to ideas from harmonic analysis via “geometry-adaptive harmonic representations”. The paper covers a critical missing piece of our understanding of visual generative models and will likely inspire future important theory work in this area.

Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, Pieter Abbeel
https://openreview.net/forum?id=sFyTZEqmUY
https://iclr.cc/virtual/2024/oral/19722

Aggregating data across multiple sources to train foundation models for robotics is a long-term ambitious goal. It poses significant challenges due to different robots having different sensory-motor interfaces which hinder training across large-scale datasets. This work, UniSim, is a significant step in this direction and an engineering feat, aggregating data using a unified interface based on visual perceptions and text descriptions of controls, and training a robotics simulator from the data by leveraging latest developments in vision and language domains.

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos, Jonathan Berant, Ankit Gupta
https://openreview.net/forum?id=PdaPky8MUn
https://iclr.cc/virtual/2024/oral/19761

This paper dives deep into understanding the ability of recently proposed state-space models and transformer architectures to model long-term sequential dependencies. Surprisingly, the authors find that training transformer models from scratch leads to an under-estimation of their performance and demonstrates dramatic gains can be achieved with a pre-training and fine-tuning setup. The paper is exceptionally well executed and exemplary in its focus on simplicity and systematic insights.

Protein Discovery with Discrete Walk-Jump Sampling
Nathan C. Frey, Dan Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi
https://openreview.net/forum?id=zMPHKOmQNb
https://iclr.cc/virtual/2024/oral/19713

This paper addresses the problem of sequence-based antibody design, a timely and important application for generative models of protein sequences. To this end, the authors introduce an innovative and effective new modeling approach specifically tailored to the problem of handling discrete protein sequence data. In addition to validating the method in silico, the authors perform extensive wet lab experiments to measure antibody binding affinity in vitro, demonstrating the effectiveness of their generative method.

Vision Transformers Need Registers
Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski
https://openreview.net/forum?id=2dnO3LLiJ1
https://iclr.cc/virtual/2024/oral/19794

This paper identifies artifacts in feature maps of vision transformer networks, characterized by high-norm tokens in low-informative background areas. The authors provide key hypotheses for why this is happening and provide a simple yet elegant solution to address these artifacts using additional register tokens, enhancing model performance on various tasks. The insights gained from this work can also impact other application areas. The paper is very well-written and provides a great example of conducting research – identifying an issue, understanding why it is happening, and then providing a solution.

Honorable mentions

Amortizing intractable inference in large language models
Edward J Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin
https://openreview.net/forum?id=Ouj6p4ca60
https://iclr.cc/virtual/2024/oral/19763

The paper proposes a promising alternative to autoregressive decoding in LLMs from a Bayesian inference perspective that can inspire follow-up studies.

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization
Ian Gemp, Luke Marris, Georgios Piliouras
https://openreview.net/forum?id=cc8h3I3V4E
https://iclr.cc/virtual/2024/oral/19744

An exceptionally clearly written paper, making progress on the important problem of developing efficient and scalable Nash solvers.

Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness
Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, Liwei Wang
https://openreview.net/forum?id=HSKaGOi7Ar
https://iclr.cc/virtual/2024/oral/19773

Expressivity of GNNs is an important topic for which current solutions, such as the Weisfeiler-Lehman test, still come with significant limitations. The authors propose a new “expressivity theory” based on homomorphism counts.

Flow Matching on General Geometries
Ricky T. Q. Chen, Yaron Lipman
https://openreview.net/forum?id=g7ohDlTITL
https://iclr.cc/virtual/2024/oral/19740

This paper tackles the challenging yet important problem of generative modeling on general geometric manifolds, for which it proposes a practical and efficient algorithm. The paper is presented exceptionally well and features a comprehensive experimental validation on a wide range of tasks.

Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan, Mamshad Nayeem Rizve, Joao Carreira, Yuki M Asano, Yannis Avrithis
https://openreview.net/forum?id=Yen1lGns2o
https://iclr.cc/virtual/2024/oral/19752

The paper proposes a novel path to self-supervised image pre-training, by learning from continuous videos. The paper contributes both new types of data and a method to learn from novel data.

Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction
Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng, Ying Wei
https://openreview.net/forum?id=TpD2aG1h0D
https://iclr.cc/virtual/2024/oral/19759

The authors propose a new variance reduction approach to meta continuous learning. The approach is presented well and has not only practical impact but is also backed up with regret analysis.

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao
https://openreview.net/forum?id=uNrFpDPMyo
https://iclr.cc/virtual/2024/oral/19718

The paper targets the critical KV cache compression problem with great impact on transformer based LLMs, reducing the memory with a simple idea that can be deployed without resource intensive fine-tuning or re-training. The approach is quite simple and yet is shown to be quite effective.

Proving Test Set Contamination in Black-Box Language Models
Yonatan Oren, Nicole Meister, Niladri S. Chatterji, Faisal Ladhak, Tatsunori Hashimoto
https://openreview.net/forum?id=KS8mIvetg2
https://iclr.cc/virtual/2024/oral/19769

A simple yet elegant method to test whether a supervised-learning dataset has been included in LLM training.

Robust agents learn causal world models
Jonathan Richens, Tom Everitt
https://openreview.net/forum?id=pOoKI3ouv1
https://iclr.cc/virtual/2024/oral/19724

This paper makes progress on laying theoretical foundations towards understanding the role of causal reasoning in agents’ ability to generalize to new domains and has potential implications for a range of related fields.

The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
https://openreview.net/forum?id=aN4Jf6Cx69
https://iclr.cc/virtual/2024/oral/19749

A timely and exceptionally systematic study of the mechanics that underlie in-context vs. in-weight learning at a point where we are only starting to understand these phenomena.

Towards a statistical theory of data selection under weak supervision
Germain Kolossov, Andrea Montanari, Pulkit Tandon
https://openreview.net/forum?id=HhfcNgQn6p
https://iclr.cc/virtual/2024/oral/19772

The paper establishes statistical foundations for data subset selection and identifies the shortcomings of popular data selection methods.

May 6 2024

Code of Ethics Cases at ICLR 2024

Yisong Yue ICLR 2024

As machine learning conferences grow, we as a community need to put more thought and planning into maintaining a fair review process. During the recent round of reviews, several issues pertaining to academic integrity came to our attention. This prompted us to conduct an investigation, including, e.g., partnering with sister conferences for checking for dual submissions.

We emphasize that ICLR received over 7000 submissions, and these issues represent only a handful of cases. Most reviewers worked tirelessly to maintain the high standards of our review systems – we would like to thank you all for your efforts.

Nonetheless, our investigation (much of which was via automated scripts) found multiple cases of plagiarism, dual submissions, and evidence of collusion by groups of reviewers and authors. These behaviors are all in direct violation of the Code of Ethics, and we have reason to believe that they are happening across many top AI conferences.

Our current activities include:

Establishing a committee dedicated to reviewing detected cases and recommending penalties (if appropriate).
Coordinating with other AI/ML conferences to conduct cross-conference investigations to address these issues.

Note that ICLR uses OpenReview which automatically detects conflict of interests as long as they are added to your profile. Keeping your profile up-to-date is a great way to avoid the possible appearance of a conflict of interest.

We encourage anyone with additional information to email iclr2024.programchairs@gmail.com or submit via this form (anonymity enabled). We are interested in evidence of:

Plagiarism
Dual submissions
Collusion

All communications will be treated with the highest confidentiality possible, and identities of anyone coming forward will be protected.

April 22 2024

Hugging Face Demo Site

Yisong Yue ICLR 2024

Hugging Face is hosting a demo site for authors to find and claim their papers on the Hugging Face website and discuss those papers on dedicated pages:
https://huggingface.co/spaces/ICLR2024/ICLR2024-papers
The page includes all accepted papers, along with linked datasets, models, and demos.

If you have any questions, please contact the Hugging Face team via ahsen@huggingface.co

April 15 2024

Announcing ICLR 2024 Invited Speakers

Yisong Yue ICLR 2024

We are pleased to announce the Invited Speakers for ICLR 2024. These speakers were selected to cover a range of topics, both within core machine learning, and also in adjacent areas of interest (e.g., legal considerations, sustainability, and drug design). Those attending ICLR can see the full schedule here (if attending in person, please set your timezone to Europe/Vienna).

Invited Speakers in alphabetical order (talk titles & abstracts subject to change):

Kyunghyun Cho, NYU & Genentech (Prescient Design)

Talk Title: “Machine Learning in Prescient Design’s Lab-in-the-Loop Antibody Design”

Abstract: TBA

Priya Donti, MIT & Climate Change AI

Talk Title: “ Optimization-in-the-loop ML for climate and energy”

Abstract: TBA

Kate Downing, Law Offices of Kate Downing

Talk Title: “Legal Fundamentals for AI Researchers”

Abstract: This talk will cover fundamental legal principles all AI researchers should understand. This talk will explore why legislators are looking at new laws specifically for AI and the goals they want to accomplish with those laws. It will also cover legal risks related to using training datasets, understanding dataset licenses, and options for licensing models in an open fashion.

Raia Hadsell, Google DeepMind

Talk Title: “Learning through AI’s winters and springs: unexpected truths on the road to AGI”

Abstract: After decades of steady progress and occasional setbacks, the field of AI now finds itself at an inflection point. AI products have exploded into the mainstream, we’ve yet to hit the ceiling of scaling dividends, and the community is asking itself what comes next. In this talk, Raia will draw on her 20 years experience as an AI researcher and AI leader to examine how our assumptions about the path to Artificial General Intelligence (AGI) have evolved over time, and to explore the unexpected truths that have emerged along the way. From reinforcement learning to distributed architectures and the potential of neural networks to revolutionize scientific domains, Raia argues that embracing lessons from the past offers valuable insights for AI’s future research roadmap.

Moritz Hardt, Max Planck Institute for Intelligent Systems, Tübingen

Talk Title: “The emerging science of benchmarks”

Abstract: Benchmarks are the keystone that hold the machine learning community together. Growing as a research paradigm since the 1980s, there’s much we’ve done with them, but little we know about them. In this talk, I will trace the rudiments of an emerging science of benchmarks through selected empirical and theoretical observations. Specifically, we’ll discuss the role of annotator errors, external validity of model rankings, and the promise of multi-task benchmarks. The results in each case challenge conventional wisdom and underscore the benefits of developing a science of benchmarks.

Devi Parikh, Georgia Tech

Talk Title: “Stories from my life“

Abstract: This is going to be an unusual AI conference keynote talk. When we talk about why the technological landscape is the way it is, we talk a lot about the macro shifts – the internet, the data, the compute. We don’t talk about the micro threads, the individual human stories as much, even though it is these individual human threads that cumulatively lead to the macro phenomenon. We should talk about these stories more! So that we can learn from each other, inspire each other. So we can be more robust; more effective in our endeavors. By strengthening our individual threads and our connections, we can weave a stronger fabric together. This talk is about some of my stories from my 20-year journey so far – about following up on all threads, about learnt reward functions, about fleeting opportunities, about multidimensional impact landscapes, and about curiosity for new experiences. It might seem narcissistic, but hopefully it will also feel authentic and vulnerable. And hopefully you will get something out of it.

Jie Tang, Tsinghua University

Talk Title: “The ChatGLM’s Road to AGI”

Abstract: Large language models have substantially advanced the state of the art in various AI tasks, such as natural language understanding and text generation, and image processing, multimodal modeling. In this talk, we will first introduce the development of AI in the past decades, in particular from the angle of China. We will also talk about the opportunities, challenges, and risks of AGI in the future. In the second part of the talk, we will use ChatGLM, an alternative but open-sourced model to ChatGPT, as an example to explain our understanding and insights derived during the implementation of the model.