Announcing the ICLR 2022 Outstanding Paper Award Recipients

By ICLR 2022 Senior Program Chair Yan Liu and Program Chairs Chelsea Finn, Yejin Choi, Marc Deisenroth

We are delighted to announce the recipients of the ICLR 2022 Outstanding Paper Awards!

First, we would like to thank the members of the ICLR community, including reviewers, area chairs, and senior area charis, who provided valuable discussions and feedback to guide the award selection. 

In addition, we would like to extend a special thanks to the Outstanding Paper Award Selection Committee, which consisted of Andreas Krause (ETHZ), Atlas Wang (UT Austin), Been Kim (Google Brain), Bo Li (UIUC), Bohyung Han (SNU), He He (NYU), and Zaid Harchaoui (UW), for generously sharing their time and expertise for making the final selection. 

Outstanding Paper Awards

The following seven papers are chosen as recipients of the Outstanding Paper Award, due to their excellent clarity, insight, creativity, and potential for lasting impact. Additional details about the paper selection process are provided below. 

The award recipients are (in order of paper ID):

Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models

By Fan Bao, Chongxuan Li, Jun Zhu, Bo Zhang

Defusion probabilistic model (DPM), a class of powerful generative models, is a rapidly growing topic in machine learning. This paper aims to tackle the inherent limitation of the DPM models, which is the slow and expensive computation of the optimal reverse variance in DPMs. The authors first present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms with respect to its score function. Then they propose Analytic-DPM, a novel and elegant training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. This paper is significant both in terms of its theoretical contribution (showing that both the optimal reverse variance and KL divergence of a DPM have analytic forms) and its practical benefit (presenting a training-free inference applicable to various DPM models), and will likely influence future research on DPMs.

*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST)

Hyperparameter Tuning with Renyi Differential Privacy

By Nicolas Papernot, Thomas Steinke

This paper provides new insights into an important blind spot of most of the prior analyses of the differential privacy of learning algorithms, namely the fact that the learning algorithm is run multiple times over the data in order to tune the hyperparameters. The authors show that there are situations in which part of the data can skew the optimal hyperparameters, henceforth leaking private information. Furthermore, the authors provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. This is an excellent paper considering the everyday use of learning algorithms and its implications in terms of privacy for society, and proposing ways to address this issue. This work will provide the foundation for many follow-up works on differentially private machine learning algorithms. 

*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).

Learning Strides in Convolutional Neural Networks

By Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

This paper addresses an important problem that anyone using convolutional networks has faced, namely setting the strides in a principled way as opposed to trials and errors. The authors propose a novel and very clever mathematical formulation for learning strides and demonstrate a practically useful method that achieves state-of-the-art experimental results in comprehensive benchmarks. The main idea is DiffStride, the first downsampling layer with learnable strides that allows one to learn the size of a cropping mask in the Fourier domain, effectively performing resizing in a way that is amenable to differentiable programming. This is an excellent paper that proposes a method that will likely be part of commonly used tool boxes as well as courses on deep learning.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Expressiveness and Approximation Properties of Graph Neural Networks

By Floris Geerts, Juan L Reutter

This elegant theoretical paper shows how questions regarding the expressiveness and separability of different graph neural networks GNN architectures can be reduced to (and sometimes substantially simplified by) examining their computations in tensor language, where these questions connect to well-known combinatorial notions such as the treewidth. In particular, this paper provides an elegant way to easily obtain bounds on the separation power of GNNs in terms of the Weisfeiler-Leman (WL) tests, which have become the yardstick to measure the separation power of GNNs. The proposed framework also has implications for studying approximability of functions through GNNs. This paper has the potential to make a significant impact for future research by providing a general framework for describing, comparing and analyzing GNN architectures. In addition, this paper provides a toolbox with which GNN architecture designers can analyze the separation power of their GNNs, without needing to know the intricacies of the WL-tests.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Comparing Distributions by Measuring Differences that Affect Decision Making

By Shengjia Zhao, Abhishek Sinha, Yutong (Kelly) He, Aidan Perreault, Jiaming Song, Stefano Ermon

This paper proposes a new class of discrepancies that can compare two probability distributions based on the optimal loss for a decision task. By suitably choosing the decision task, the proposed method generalizes the Jensen-Shannon divergence and the maximum mean discrepancy family. The authors demonstrate that the proposed approach achieves superior test power compared to competitive baselines on various benchmarks, with compelling use cases for understanding the effects of climate change on different social and economic activities, evaluating sample quality, and selecting features targeting different decision tasks. Not only is the proposed method intellectually elegant, the committee finds that the paper is exceptional for its empirical significance, as the fact that the method allows a user to directly specify their preferences when comparing distributions through the decision loss implies an increased level of interpretability for practitioners. 

*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST).

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

By X.Y. Han, Vardan Papyan, David L. Donoho

This paper presents new theoretical insights on the “neural collapse” phenomenon that occurs pervasively in today’s deep net training paradigm. During neural collapse, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Instead of the cross-entropy loss that is mathematically harder to analyze, the paper demonstrates a new decomposition of the mean squared error (MSE) loss in order to analyze each component of the loss under neural collapse, which in turn, leads to a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. Finally, by studying renormalized gradient flow along the central path, the authors derive exact dynamics that predict neural collapse. In sum, this paper provides novel and highly inspiring theoretical insights for understanding the empirical training dynamics of deep networks.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Bootstrapped Meta-Learning

By Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

Meta-learning, or learning to learn, has the potential to empower artificial intelligence, yet meta-optimization has been a considerable challenge to unlocking this potential. This paper opens a new direction in meta-learning, beautifully inspired from TD learning, that bootstraps the meta-learner from itself or another update rule. The theoretical analysis is thorough, and the empirical results are compelling, with a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. The committee believes that this paper will inspire a lot of people.

*This paper will be presented in the Oral Session 3 on Meta Learning and Adaptation on Apr 27 4pm GMT (9am PST).

Outstanding Paper Honorable Mentions

Understanding over-squashing and bottlenecks on graphs via curvature

By Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, Michael M. Bronstein

Most graph neural networks (GNNs) use the message passing paradigm, which suffers from the “over-squashing” phenomenon, where the distortion of information flowing from distant nodes limits the efficiency of message passing, which in turn, has been heuristically attributed to graph bottlenecks. Drawing insights from discrete differential geometry, this paper provides a precise description of the over-squashing phenomenon in GNNs and analyzes how it arises from bottlenecks in the graph. In particular, the authors introduce a new edge-based combinatorial curvature and prove that negatively curved edges are responsible for the over-squashing issue. Moreover, the authors demonstrate an elegant approach to reducing these negative effects by rewiring the graph according to this curvature notion. The paper has a potential to make considerable impact by importing tools from differential geometry for analyzing GNNs, and the rewiring approach may suggest new directions for improving the empirical performance of GNNs.

*This paper will be presented in the Oral Session 2 on Structured Learning on Apr 26 8am GMT (1am PST).

Efficiently Modeling Long Sequences with Structured State Spaces

By Albert Gu, Karan Goel, Christopher Re

Modeling long sequences is a central challenge in representation learning across various tasks and modalities, and the dominant architecture over the past years has been Transformers. This paper investigates a surprising alternative to Transformers, by proposing the Structured State Space Sequence model (S4). The S4 model is basically a new and clever parameterization of the state space model (SSM), that can address the prohibitive computation and memory requirements of SSM, while maintaining the theoretical strengths of SSM for handling long-range dependencies. S4 demonstrates impressive empirical results on multiple domains including vision, text, and audio. Unlike most work that tries to make Transformers more efficient, this work takes a completely different approach by focusing on the less studied state space models. The technical elegance and empirical strengths of the proposed approach inspire a new research direction.

*This paper will be presented in the Oral Session 2 on Structured Learning on Apr 26 8am GMT (1am PST).

PiCO: Contrastive Label Disambiguation for Partial Label Learning

By Haobo Wang, Ruixuan Xiao, Yixuan (Sharon) Li, Lei Feng, Gang Niu, Gang Chen, Junbo Zhao

This paper studies Partial Label Learning (PLL), an important problem in real-world applications where each training example is labeled with a coarse candidate set due to label ambiguity. This paper aims to reduce the performance gap between PLL and the supervised counterpart, by addressing two key challenges in PLL—representation learning and label disambiguation—in one coherent framework. Specifically, the authors propose PiCO, a new framework that combines contrastive learning with prototype-based label disambiguation. The paper includes an interesting theoretical interpretation to justify their framework from an expectation-maximization (EM) perspective. The empirical results are particularly impressive, as PiCO significantly outperforms current state-of-the-art in PLL and even achieves comparable results to fully supervised learning. 

*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).

Selection Process

The Outstanding Paper Committee determined a selection process with the goal of identifying an equivalence class of outstanding papers that represent the breadth of excellent research being conducted by the ICLR community.

The committee began with an initial pool of 58 papers including all papers that were explicitly nominated for an award by area chairs or senior area chairs as well as all papers that were slated for an oral presentation. The committee used three phases of down-selection. During Phase 1, each paper was assigned to one primary reader to determine if the paper should move to Phase 2, and in addition, committee members optionally endorsed other papers that were outside their assignments to move to Phase 2. We had 20 shortlisted papers after Phase 1. During Phase 2, each paper was assigned with an additional secondary reader for closer examination, and in addition, committee members optionally endorsed other papers that were outside their assignments to move to Phase 3. We had 10 shortlisted papers after Phase 2. During Phase 3, all remaining papers were considered and approved by the entire committee based on the rankings and nomination notes shared by the readers from Phase 1 and 2. After Phase 3, the committee decided 7 papers in the equivalence set for the Outstanding Paper Awards, and additional 3 papers for the Honorable Mentions. At all phases, the committee members could read or endorse papers only if they do not have conflicts of interests, either based on the domain conflicts, or based on the personal relationship conflicts (e.g,. friends or former advisors). In order to promote honest and fair evaluations among the selection committee, the paper assignments were kept confidential so that judgments are not biased by the presence of other committee members who have conflicts of interests with some of the papers in the pool.