Double descent is a phenomenon of over-parameterized statistical models such as deep neural networks which have a re-descending property in their risk function. As the complexity of the model increases, risk exhibits a U-shaped region due to the traditional bias-variance trade-off, then as the number of parameters equals the number of observations and the model becomes one of interpolation where the risk can be unbounded and finally, in the over-parameterized region, it re-descends -- the double descent effect. Our goal is to show that this has a natural Bayesian interpretation. We also show that this is not in conflict with the traditional Occam's razor -- simpler models are preferred to complex ones, all else being equal. Our theoretical foundations use Bayesian model selection, the Dickey-Savage density ratio, and connect generalized ridge regression and global-local shrinkage methods with double descent. We illustrate our approach for high dimensional neural networks and provide detailed treatments of infinite Gaussian means models and non-parametric regression. Finally, we conclude with directions for future research.
二重下降(Double descent)は、過パラメータ化統計モデル(深層ニューラルネットワークなど)がリスク関数において示す再下降特性である。モデルの複雑度が増加するにつれて、リスク関数は従来の偏差-分散トレードオフにより U 字型領域を示し、パラメータ数が観測数に等しくなるとモデルは補間モデルとなり、リスクは無界となる可能性がある。最後に過パラメータ化領域で再び下降する——これが二重下降効果である。本論文の目的は、この現象が自然なベイズ解釈を有すること、および従来のオッカムの剃刀原理と矛盾しないことを証明することである。理論的基礎はベイズモデル選択、Dickey-Savage密度比を使用し、一般化リッジ回帰と大域-局所収縮法を二重下降と関連付ける。