79597106

Date: 2025-04-28 18:41:52
Score: 2
Natty:
Report link

That's because both serve different purposes. There are many tasks in NLP where you simply need to tokenize by word. Handling multi-word expressions where there are certain pre-defined phrases you would like to keep fixed during tokenization, you use MWEtokenizer. If you use n-grams, then you might get irrelevant combinations, which requires additional time in filtering the unwanted ones, unless there is an exploration aspect to your task, where you are looking for a specific phrase.

Reasons:
  • No code block (0.5):
  • Single line (0.5):
  • Low reputation (1):
Posted by: yoyo