Ticket #152 (closed enhancement: fixed)
Spider constructor arguments
| Reported by: | pablo | Owned by: | pablo |
|---|---|---|---|
| Priority: | major | Milestone: | 0.9 |
| Component: | code | Version: | |
| Keywords: | Cc: | daniel pablo |
Description
Spiders in Scrapy are currently used for three main things:
- to define how each site will be crawled and parsed through the logic of its methods and callbacks
- to keep a context of each site being scraped (by using instance attributes to maintain some state during the crawling session)
- to define custom (per-spider) settings that are used by framework components (like middlewares and extensions). For example: the download_delay attribute, which overrides the DOWNLOAD_DELAY setting, per spider.
In order to formalize support for (3) [without affecting (1) or (2)] I propose we add support for passing generic keyword arguments to BaseSpider constructor, which would then be set as instance attributes in the BaseSpider constructor.
This would thus become the canonical way to configure per-spider settings. Then, the "per-spider setting" support will depend on whether each component supports the spider attribute or not.
Also, we'd want to add a scrapy-ctl option to pass spider arguments. Maybe --spiderarg / -A ?
For example:
scrapy-ctl.py crawl example.com -A download_delay=2 -A category_to_scrape=1234
Change History
Note: See
TracTickets for help on using
tickets.
