Abstract
Implicit layers are computational modules that output the solution to some problem depending on the input and the layer parameters. Deep equilibrium models (DEQs) output a solution to a fixed point equation. Deep declarative networks (DDNs) solve an optimisation problem in their forward pass, an arguably more intuitive, interpretable problem than finding a fixed point. We show that solving a kernelised regularised maximum likelihood estimate as an inner problem in a DDN yields a large class of DEQ architectures. Our proof uses the exponential family in canonical form, and provides a closed-form expression for the DEQ parameters in terms of the kernel. The activation functions have interpretations in terms of the derivative of the log partition function. Building on existing literature, we interpret DEQs as fine-tuned, unrolled classical algorithms, giving an intuitive justification for why DEQ models are sensible. We use our theoretical result to devise an initialisation scheme for DEQs that allows them to solve kGLMs in their forward pass at initialisation. We empirically show that this initialisation scheme improves training stability and performance over random initialisation.
Original language | English |
---|---|
Title of host publication | The Tenth International Conference on Learning Representations |
Editors | Yann LeCun |
Place of Publication | USA |
Publisher | International Conference on Learning Representations (ICLR) |
Publication status | Published - 2022 |
Externally published | Yes |
Event | International Conference on Learning Representations 2022 - Online, United States of America Duration: 25 Apr 2022 → 29 Apr 2022 Conference number: 10th https://openreview.net/group?id=ICLR.cc/2022/Conference (Peer Reviews) https://iclr.cc/Conferences/2022 (Website) |
Conference
Conference | International Conference on Learning Representations 2022 |
---|---|
Abbreviated title | ICLR 2022 |
Country/Territory | United States of America |
Period | 25/04/22 → 29/04/22 |
Internet address |
|
Keywords
- deep equilibrium models
- deep declarative networks
- implicit layers
- kernel methods
- generalised linear models