79227662

Date: 2024-11-26 17:10:35
Score: 2
Natty:
Report link

You can often directly access the tokenizer from the pipe and call it with your string to get the attention mask:

>>> pipe.tokenizer("Blah blah blah.")
{'input_ids': [101, 27984, 27984, 27984, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1]}

>>> pipe.tokenizer("Blah blah blah.")['attention_mask']
{'attention_mask': [1, 1, 1, 1, 1, 1]}

But even if that's not an option, it looks like you have access to the tokenizer at initialization. Why not use that directly?

Reasons:
  • Has code block (-0.5):
  • Ends in question mark (2):
  • Low reputation (0.5):
Posted by: Joseph Catrambone