The relationship between model size and training data size isn't always direct. In my language detection neural network, for example, the model size is primarily determined by the network's architecture, not the amount of training data.
Specifically:
The input layer's size adapts to the length of the longest sentence in the dataset.
The hidden layer has a fixed size (10 neurons in my case).
The output layer's size is determined by the number of languages being classified.
Therefore, whether I train with 10 sentences or 1 million sentences, the model size remains the same, provided the length of the longest sentence and the number of languages remain unchanged.
You can see this implementation in action here: https://github.com/cikay/language-detection