Can someone please guide me on how to convert a PyTorch .ckpt
model to a Hugging Face-supported format so that I can use it with pre-trained models?
The model I'm trying to convert was trained using PyTorch Lightning, and you can find it here:
🔗 hydroxai/pii_model_longtransfomer_version
I need to use this model with the following GitHub repository for testing:
🔗 HydroXai/pii-masker
I tried using Hugging Face Spaces to convert the model to .safetensors
format. However, the resulting model produces poor results and triggers several warnings.
These are the warnings I'm seeing:
Some weights of the model checkpoint at /content/pii-masker/pii-masker/output_model/deberta3base_1024 were not used when initializing DebertaV2ForTokenClassification: ['deberta.head.lstm.bias_hh_l0', 'deberta.head.lstm.bias_hh_l0_reverse', 'deberta.head.lstm.bias_ih_l0', 'deberta.head.lstm.bias_ih_l0_reverse', 'deberta.head.lstm.weight_hh_l0', 'deberta.head.lstm.weight_hh_l0_reverse', 'deberta.head.lstm.weight_ih_l0', 'deberta.head.lstm.weight_ih_l0_reverse', 'deberta.output.bias', 'deberta.output.weight', 'deberta.transformers_model.embeddings.LayerNorm.bias', 'deberta.transformers_model.embeddings.LayerNorm.weight', 'deberta.transformers_model.embeddings.token_type_embeddings.weight', 'deberta.transformers_model.embeddings.word_embeddings.weight', 'deberta.transformers_model.encoder.layer.0.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.0.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.0.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.0.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.0.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.0.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.0.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.0.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.0.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.0.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.0.output.dense.bias', 'deberta.transformers_model.encoder.layer.0.output.dense.weight', 'deberta.transformers_model.encoder.layer.1.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.1.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.1.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.1.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.1.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.1.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.1.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.1.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.1.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.1.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.1.output.dense.bias', 'deberta.transformers_model.encoder.layer.1.output.dense.weight', 'deberta.transformers_model.encoder.layer.10.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.10.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.10.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.10.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.10.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.10.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.10.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.10.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.10.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.10.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.10.output.dense.bias', 'deberta.transformers_model.encoder.layer.10.output.dense.weight', 'deberta.transformers_model.encoder.layer.11.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.11.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.11.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.11.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.11.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.11.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.11.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.11.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.11.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.11.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.11.output.dense.bias', 'deberta.transformers_model.encoder.layer.11.output.dense.weight', 'deberta.transformers_model.encoder.layer.2.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.2.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.2.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.2.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.2.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.2.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.2.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.2.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.2.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.2.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.2.output.dense.bias', 'deberta.transformers_model.encoder.layer.2.output.dense.weight', 'deberta.transformers_model.encoder.layer.3.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.3.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.3.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.3.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.3.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.3.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.3.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.3.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.3.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.3.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.3.output.dense.bias', 'deberta.transformers_model.encoder.layer.3.output.dense.weight', 'deberta.transformers_model.encoder.layer.4.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.4.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.4.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.4.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.4.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.4.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.4.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.4.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.4.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.4.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.4.output.dense.bias', 'deberta.transformers_model.encoder.layer.4.output.dense.weight', 'deberta.transformers_model.encoder.layer.5.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.5.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.5.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.5.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.5.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.5.attention.self.value_global.weight', 'deberta.transformers_model.encoder.layer.5.intermediate.dense.bias', 'deberta.transformers_model.encoder.layer.5.intermediate.dense.weight', 'deberta.transformers_model.encoder.layer.5.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.5.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.5.output.dense.bias', 'deberta.transformers_model.encoder.layer.5.output.dense.weight', 'deberta.transformers_model.encoder.layer.6.attention.output.LayerNorm.bias', 'deberta.transformers_model.encoder.layer.6.attention.output.LayerNorm.weight', 'deberta.transformers_model.encoder.layer.6.attention.output.dense.bias', 'deberta.transformers_model.encoder.layer.6.attention.output.dense.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.key.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.key.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.key_global.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.key_global.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.query.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.query.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.query_global.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.query_global.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.value.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.value.weight', 'deberta.transformers_model.encoder.layer.6.attention.self.value_global.bias', 'deberta.transformers_model.encoder.layer.6.attention.self.............'deberta.encoder.layer.9.output.dense.bias', 'deberta.encoder.layer.9.output.dense.weight', 'deberta.encoder.rel_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.