TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

In a world where GenAI dominates, the foundations of Machine Learning are still dominated by decision trees. Text is everywhere: almost any dataframe contains it, but many traditional algorithms neglect it. We introduce TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations In this paper, we present a novel architecture for handling tabular data with text features. We pretrain over hundreds of datasets and leverage transfer learning to achieve SOTA results for classification tasks. Our end-to-end architecture tailors feature embeddings to each task and is designed for massive pretraining.

Room: Room 3

Tue, Oct 28th, 9:00 - 9:30

Speakers

Alan Arazi