Reports

2025 answer, not directly answering the question but can be a good solution for many.

TL;DR

Use ModernBERT for up to 8k tokens (16x larger than original BERT’s 512-token limit).
Chunk the text if it exceeds 8k tokens or if you must stick to an older BERT model.

BERT’s 512-token limit has historically meant you either had to truncate long text or split it into multiple 512-token chunks. However, ModernBERT (released in December 2024) now supports sequences up to 8,192 tokens, making it a drop-in replacement for long-form text without chunking.

That’s it — ModernBERT is basically BERT with a bigger window and better performance. Even for shorter text it's just a better model (backed by many benchmarks).

79339865