I can suggest using llm model such as gemini flash for building the initial dataset, provide a detailed prompt using there API and generate few thousand sentences. Then train a Language model (not large language model; language models are smaller in size and generally free of cost) like bard for your task.