79339865

Date: 2025-01-08 15:58:04
Score: 1
Natty:
Report link

2025 answer, not directly answering the question but can be a good solution for many.

TL;DR

  1. Use ModernBERT for up to 8k tokens (16x larger than original BERT’s 512-token limit).
  2. Chunk the text if it exceeds 8k tokens or if you must stick to an older BERT model.

BERT’s 512-token limit has historically meant you either had to truncate long text or split it into multiple 512-token chunks. However, ModernBERT (released in December 2024) now supports sequences up to 8,192 tokens, making it a drop-in replacement for long-form text without chunking.

That’s it — ModernBERT is basically BERT with a bigger window and better performance. Even for shorter text it's just a better model (backed by many benchmarks).

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Maciej Szulc