ChatGPT suggested that from_pretrained
might be the cause, so I checked the packages responsible for loading the models.
It's caused by from_pretrained
called inside the constructor of M3Embedder
from FlagEmbedding package. This method seems to perform a fetch every time, and surprisingly it does use the cache (otherwise it would not just cost a few seconds, but spend much longer downloading the model again), yet it never skips the fetch step.
By setting a local path to the model and explicitly specifying local_files_only=True
in the args of from_pretrained
I managed to bypass this fetch process eventually.