Scrapy 0.9 Changes
This section of Scrapy wiki documents all new features and backwards-incompatible changes to Scrapy 0.9 since the 0.8 release.
Contents
- New features and improvements
- Added SMTP-AUTH support to scrapy.mail
- Added new scrapy-ctl view command
- Added web service for controlling Scrapy process
- Support for running Scrapy as a service, for production systems
- Added wrapper induction library
- Several improvements to response encoding support
- Added LOG_ENCODING setting
- Added RANDOMIZE_DOWNLOAD_DELAY setting
- MailSender is no longer IO-blocking
- Linkextractors and new Crawlspider now handle relative base tag urls
- Several improvements to Item Loaders and processors
- Added support for adding variables to telnet console
- Support for requests without callbacks
-
API changes
- Change Spider.domain_name to Spider.name (SEP-012)
- Response.encoding is now the detected encoding
- HttpErrorMiddleware now returns None or raises an exception
- scrapy.command modules relocation
- Added ExecutionQueue for feeding spiders to scrape
- Removed ExecutionEngine singleton
- Ported S3ImagesStore (images pipeline) to use boto and threads
- Moved module: {{scrapy.management.telnet}} to {{scrapy.telnet}}
- Changes to default settings
New features and improvements
Added SMTP-AUTH support to scrapy.mail
New settings added: MAIL_USER, MAIL_PASS
Added new scrapy-ctl view command
To view URL in the browser, as seen by Scrapy
Added web service for controlling Scrapy process
This also deprecates the web console.
Support for running Scrapy as a service, for production systems
r1988, r2054, r2055, r2056, r2057 | #168
Added wrapper induction library
Added new wrapper induction library (documentation only available in source code for now).
Several improvements to response encoding support
Simplified and improved response encoding support.
Added LOG_ENCODING setting
r1956 | http://doc.scrapy.org/0.9/topics/settings.html#setting-LOG_ENCODING
Added RANDOMIZE_DOWNLOAD_DELAY setting
Which is also enabled by default.
r1923 | http://doc.scrapy.org/0.9/topics/settings.html#setting-RANDOMIZE_DOWNLOAD_DELAY
MailSender is no longer IO-blocking
Linkextractors and new Crawlspider now handle relative base tag urls
Several improvements to Item Loaders and processors
Thanks to Ping Yin for contributing them!
r2022, r2023, r2024, r2025, r2026, r2027, r2028, r2029, r2030
Added support for adding variables to telnet console
Support for requests without callbacks
API changes
Change Spider.domain_name to Spider.name (SEP-012)
Old attributes were deprecated.
Response.encoding is now the detected encoding
It used to be the original encoding declared in HTTP headers or HTML meta tag. These can now be accessed through the Response._declared_encoding() protected method.
HttpErrorMiddleware now returns None or raises an exception
returning an iterable is not allowed anymore
scrapy.command modules relocation
The goal of these changes is to flatten the module hierarchy.
- scrapy.command.cmdline moved to scrapy.cmdline
- scrapy.command.models moved to scrapy.command
- scrapy.command.commands moved to scrapy.commands
Added ExecutionQueue for feeding spiders to scrape
Removed ExecutionEngine singleton
scrapy.core.engine.scrapyengine no longer exists, use scrapy.core.manager.scrapymanager.engine to access the engine.
Ported S3ImagesStore (images pipeline) to use boto and threads
Moved module: {{scrapy.management.telnet}} to {{scrapy.telnet}}
Changes to default settings
Changed default SCHEDULER_ORDER to 'DFO'
To minimize memory issues, since DFO typically consumes less memory.
