Finally figured out a fix, hope this would help somebody.
The problem occurs because get_all_post_dates
is called many times in process_ticker_attention
(thousands of times per ticker). The fix is to share one mongo_client inside each subprocess (NOT across processes).
def init_process(ticker, finish_num, lock):
"""Initialize a MongoDB connection for each worker process."""
global mongo_p
mongo_p = pymongo.MongoClient("mongodb://localhost:27017/", connectTimeoutMS=600000)
process_ticker_attention(ticker, finish_num, lock)