79665141

Date: 2025-06-13 16:32:46
Score: 1
Natty:
Report link

No negative impact—your approach is correct and commonly used.

Applying custom weights to the TF-IDF matrix (with norm=None), then normalizing each row using sklearn.preprocessing.normalize, produces unit vectors just like norm='l2' in TfidfVectorizer. This preserves cosine similarity and ensures each row has L2 norm = 1.

Key point: The order matters. Weighting first, then normalizing, gives you control over the influence of features before projecting vectors onto the unit sphere.

If you normalized before weighting, the vectors would not be unit length anymore.

There is no difference (other than order of operations) between the manual normalization and letting the vectorizer do it, as long as normalization is the last step.

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Daniel