Ticket #152 (closed enhancement: fixed)

Opened 5 months ago

Last modified 4 months ago

Spider constructor arguments

Reported by: pablo Owned by: pablo
Priority: major Milestone: 0.9
Component: code Version:
Keywords: Cc: daniel pablo

Description

Spiders in Scrapy are currently used for three main things:

  1. to define how each site will be crawled and parsed through the logic of its methods and callbacks
  2. to keep a context of each site being scraped (by using instance attributes to maintain some state during the crawling session)
  3. to define custom (per-spider) settings that are used by framework components (like middlewares and extensions). For example: the download_delay attribute, which overrides the DOWNLOAD_DELAY setting, per spider.

In order to formalize support for (3) [without affecting (1) or (2)] I propose we add support for passing generic keyword arguments to BaseSpider constructor, which would then be set as instance attributes in the BaseSpider constructor.

This would thus become the canonical way to configure per-spider settings. Then, the "per-spider setting" support will depend on whether each component supports the spider attribute or not.

Also, we'd want to add a scrapy-ctl option to pass spider arguments. Maybe --spiderarg / -A ?

For example:

scrapy-ctl.py crawl example.com -A download_delay=2 -A category_to_scrape=1234

Change History

Changed 4 months ago by pablo

  • status changed from new to closed
  • resolution set to fixed

Done in r1978.

Note: See TracTickets for help on using tickets.