Looking at this data, it’s too kind messy and very dirty. Processing or cleaning it with a one-magic function might be too overreach. I don’t see you achieving your intended aims with a one-fit function for cleaning. The recommended approach would be the following;
Start by lowering all cases and removing punctuation
Normalize all abbreviations
2. Language Detection plus translation: Another way is to unify all of this into one language
3. String matching: Build a large dictionary of all official degree names e.g, Bachelor, master, PhD
4. Train a text classifier