If I understand your problem, then you do have your "superclass", but not your subclasses.
I dont know how you are now informing your model of this underlying structure, but techniques like Deep Sets might be usefull (https://arxiv.org/abs/1703.06114). It embeds the "nearness" to other labels from its superclass.
You also mention imbalance. Do you with this mean class imbalance? If so, oversampling with SMOTE tends to do well.