Reports

Looking at this data, it’s too kind messy and very dirty. Processing or cleaning it with a one-magic function might be too overreach. I don’t see you achieving your intended aims with a one-fit function for cleaning. The recommended approach would be the following;

Preprocessing

Start by lowering all cases and removing punctuation
Normalize all abbreviations

2. Language Detection plus translation: Another way is to unify all of this into one language

3. String matching: Build a large dictionary of all official degree names e.g, Bachelor, master, PhD

4. Train a text classifier

79770251