79770251

Date: 2025-09-20 11:18:46
Score: 1
Natty:
Report link

Looking at this data, it’s too kind messy and very dirty. Processing or cleaning it with a one-magic function might be too overreach. I don’t see you achieving your intended aims with a one-fit function for cleaning. The recommended approach would be the following;

  1. Preprocessing

2. Language Detection plus translation: Another way is to unify all of this into one language

3. String matching: Build a large dictionary of all official degree names e.g, Bachelor, master, PhD

4.  Train a text classifier

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: user30818063