Tree-Regularized Tabular Embeddings
CoRR(2024)
摘要
Tabular neural network (NN) has attracted remarkable attentions and its
recent advances have gradually narrowed the performance gap with respect to
tree-based models on many public datasets. While the mainstreams focus on
calibrating NN to fit tabular data, we emphasize the importance of homogeneous
embeddings and alternately concentrate on regularizing tabular inputs through
supervised pretraining. Specifically, we extend a recent work (DeepTLF) and
utilize the structure of pretrained tree ensembles to transform raw variables
into a single vector (T2V), or an array of tokens (T2T). Without loss of space
efficiency, these binarized embeddings can be consumed by canonical tabular NN
with fully-connected or attention-based building blocks. Through quantitative
experiments on 88 OpenML datasets with binary classification task, we validated
that the proposed tree-regularized representation not only tapers the
difference with respect to tree-based models, but also achieves on-par and
better performance when compared with advanced NN models. Most importantly, it
possesses better robustness and can be easily scaled and generalized as
standalone encoder for tabular modality. Codes:
https://github.com/milanlx/tree-regularized-embedding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要