79310235

Date: 2024-12-26 19:03:17
Score: 4
Natty:
Report link

For those of us who are newer to scrapy and not looking to do the amount of work/customization involved in defining custom log extensions, it seems this could be achieved with custom stats + periodic log stats.

Under the parse function in each spider class, I set self.crawler.stats.set_value('spider_name', self.name), and then in settings.py set "PERIODIC_LOG_STATS": {"include": "spider_name"]} (and whatever else you want output from periodic log stats). I also defining separate CrawlProcess processes for each spider.

This might be too hacky but has been working for me, and allows me to stay within the scrapy-defined log classes and extensions, while running multiple spiders via API. If anyone sees a reason why this is unacceptable please let me know, as I had mentioned I'm new to scrapy :)

Reasons:
  • RegEx Blacklisted phrase (2.5): please let me know
  • RegEx Blacklisted phrase (1.5): I'm new
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: isa