So, the root problem lies in the get_candidates_vectorised
function. The rapidfuzz library actually returns output based on the case-sensitiveness. So you need to change this function to ensure entire central is not filtered to elimination. (Add .lower()
to each bank_make
and x
)
def get_candidates_vectorized(bank_make, central_df, threshold=60):
# Use fuzzy matching on make names
make_scores = central_df['make_name'].apply(
lambda x: fuzz.token_set_ratio(bank_make.lower(), x.lower())
)
return central_df[make_scores > threshold].index.tolist()