Maybe the following approach will help. Try to use usual LLM (like llama, qwen, deeepseek etc..). Specify all possible categories in prompt and ask model to pick categories that fit to some text :)
Also you can get embeddings from all texts and feed them into multiclustering algorithm. Here's an example of such algorithm - https://www.researchgate.net/figure/Proposed-K-multicluster-algorithm_fig2_346086809.