Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25% | Synced

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” tec...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.