Reports

If you have tried GPT for the same task and you are still facing same issue. Then no other Pre-trained model is going to work. Because GPT right now now is far superior then other Pre-trained model.

And most of the Bert based NER model are not fine tuned for Indian based dataset. And that is the reason you see incorrect result.

And the 2nd reason is insufficient context. The example that you provided has not sufficient context. Not even human can tell what does your example is about. And if you try to make your example meaningful then passing modified example to already used model will give better result. And having correct context is prerequisite because most of these deep learning model do prediction based on context word.

I am adding one code which might work for you because it was fine tune on Indian dataset.

Code:

import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Checking if GPU is available
device = 0 if torch.cuda.is_available() else -1  # GPU: 0, CPU: -1

# Load pre-trained tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Venkatesh4342/NER-Indian-xlm-roberta")
model = AutoModelForTokenClassification.from_pretrained("Venkatesh4342/NER-Indian-xlm-roberta")

# Initiating NER Model
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, device=device)

# Example data
input_text = "UPI DR 400874707203 BENGALORE 08 JAN 2024 14:38:56 MEDICAL LTD HDFC 50200"

# Calling NER Model
results = ner_pipeline(input_text)

# Display the output
for entity in results:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Confidence: {entity['score']:.2f}")

Final Output:

Entity: ▁U, Label: organization, Confidence: 0.99
Entity: PI, Label: organization, Confidence: 0.99
Entity: ▁DR, Label: organization, Confidence: 0.99
Entity: ▁BEN, Label: location, Confidence: 0.76
Entity: GAL, Label: location, Confidence: 0.83
Entity: ORE, Label: location, Confidence: 0.79
Entity: ▁MEDI, Label: organization, Confidence: 1.00
Entity: CAL, Label: organization, Confidence: 0.99
Entity: ▁LTD, Label: organization, Confidence: 1.00
Entity: ▁HD, Label: organization, Confidence: 0.99
Entity: FC, Label: organization, Confidence: 0.98

The final output is little better and after post processing logic you can combine the broken word together as well.

Final note: I would suggest to go for fine tuning a NER based model on your data. And try to have meaningful context

79287918