arXiv: Data Structures and Algorithms(2014)
Abstract
The ordered weighted norm (OWL) was recently proposed, with two different motivations: its good statistical properties as a sparsity promoting regularizer; the fact that it generalizes the so-called {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR), which has the ability to cluster/group regression variables that are highly correlated. This paper contains several contributions to the study and application of OWL regularization: the derivation of the atomic formulation of the OWL norm; the derivation of the dual of the OWL norm, based on its atomic formulation; a new and simpler derivation of the proximity operator of the OWL norm; an efficient scheme to compute the Euclidean projection onto an OWL ball; the instantiation of the conditional gradient (CG, also known as Frank-Wolfe) algorithm for linear regression problems under OWL regularization; the instantiation of accelerated projected gradient algorithms for the same class of problems. Finally, a set of experiments give evidence that accelerated projected gradient algorithms are considerably faster than CG, for the class of problems considered.
MoreTranslated text
PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Related Papers
An $O(n\log(n))$ Algorithm for Projecting Onto the Ordered Weighted $\ell_1$ Norm Ball
CoRR 2015
被引用26
Scalable Sparse Subspace Clustering Via Ordered Weighted L1 Regression.
2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2018
被引用3
Improved Bounds for Square-Root Lasso and Square-Root Slope
Electronic Journal of Statistics 2018
被引用24
Journal of Imaging 2021
被引用3
Optimal Sparse Estimation of High Dimensional Heavy-tailed Time Series
arXiv (Cornell University) 2022
被引用0
Sgs: Sparse-Group SLOPE: Adaptive Bi-Level Selection with FDR Control
CRAN Contributed Packages 2023
被引用1
Set-Valued and Variational Analysis 2025
被引用1
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
GPU is busy, summary generation fails
Rerequest