Reports

ChatGPT suggested that from_pretrained might be the cause, so I checked the packages responsible for loading the models.

It's caused by from_pretrained called inside the constructor of M3Embedder from FlagEmbedding package. This method seems to perform a fetch every time, and surprisingly it does use the cache (otherwise it would not just cost a few seconds, but spend much longer downloading the model again), yet it never skips the fetch step.

By setting a local path to the model and explicitly specifying local_files_only=True in the args of from_pretrained I managed to bypass this fetch process eventually.

79648395