The out-of-sample prediction error of the square-root-LASSO and related estimators
arxiv(2022)
摘要
We study the classical problem of predicting an outcome variable, $Y$, using
a linear combination of a $d$-dimensional covariate vector, $\mathbf{X}$. We
are interested in linear predictors whose coefficients solve: % \begin{align*}
\inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n}
\left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta
\, \rho\left(\boldsymbol{\beta}\right), \end{align*} where $\delta>0$ is a
regularization parameter, $\rho:\mathbb{R}^d\to \mathbb{R}_+$ is a convex
penalty function, $\mathbb{P}_n$ is the empirical distribution of the data, and
$r\geq 1$. We present three sets of new results. First, we provide conditions
under which linear predictors based on these estimators % solve a
\emph{distributionally robust optimization} problem: they minimize the
worst-case prediction error over distributions that are close to each other in
a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed
finite-sample and asymptotic analysis of the statistical properties of the
balls of distributions over which the worst-case prediction error is analyzed.
Third, we use the distributionally robust optimality and our statistical
analysis to present i) an oracle recommendation for the choice of
regularization parameter, $\delta$, that guarantees good out-of-sample
prediction error; and ii) a test-statistic to rank the out-of-sample
performance of two different linear estimators. None of our results rely on
sparsity assumptions about the true data generating process; thus, they broaden
the scope of use of the square-root lasso and related estimators in prediction
problems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要