Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
arxiv(2023)
摘要
The Canonical Correlation Analysis (CCA) family of methods is foundational in
multiview learning. Regularised linear CCA methods can be seen to generalise
Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue
Problem (GEP) framework. However, classical algorithms for these linear methods
are computationally infeasible for large-scale data. Extensions to Deep CCA
show great promise, but current training procedures are slow and complicated.
First we propose a novel unconstrained objective that characterizes the top
subspace of GEPs. Our core contribution is a family of fast algorithms for
stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying
stochastic gradient descent (SGD) to the corresponding CCA objectives. Our
algorithms show far faster convergence and recover higher correlations than the
previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These
improvements allow us to perform a first-of-its-kind PLS analysis of an
extremely large biomedical dataset from the UK Biobank, with over 33,000
individuals and 500,000 features. Finally, we apply our algorithms to match the
performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10
and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to
clarify the links between these methods and classical CCA, laying the
groundwork for future insights.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要