Apparently the pretrained weights loaded with from_preset() are only for the backbone transformer and the MLM head has to be trained. At least it worked...