Scrapy 0.9 Changes

This section of Scrapy wiki documents all new features and backwards-incompatible changes to Scrapy 0.9 since the 0.8 release.

Contents

  1. New features and improvements
    1. Added SMTP-AUTH support to scrapy.mail
    2. Added new scrapy-ctl view command
    3. Added web service for controlling Scrapy process
    4. Support for running Scrapy as a service, for production systems
    5. Added wrapper induction library
    6. Several improvements to response encoding support
    7. Added LOG_ENCODING setting
    8. Added RANDOMIZE_DOWNLOAD_DELAY setting
    9. MailSender is no longer IO-blocking
    10. Linkextractors and new Crawlspider now handle relative base tag urls
    11. Several improvements to Item Loaders and processors
    12. Added support for adding variables to telnet console
    13. Support for requests without callbacks
  2. API changes
    1. Change Spider.domain_name to Spider.name (SEP-012)
    2. Response.encoding is now the detected encoding
    3. HttpErrorMiddleware now returns None or raises an exception
    4. scrapy.command modules relocation
    5. Added ExecutionQueue for feeding spiders to scrape
    6. Removed ExecutionEngine singleton
    7. Ported S3ImagesStore (images pipeline) to use boto and threads
    8. Moved module: {{scrapy.management.telnet}} to {{scrapy.telnet}}
  3. Changes to default settings
    1. Changed default SCHEDULER_ORDER to 'DFO'

New features and improvements

Added SMTP-AUTH support to scrapy.mail

New settings added: MAIL_USER, MAIL_PASS

r2065 | #149

Added new scrapy-ctl view command

To view URL in the browser, as seen by Scrapy

r2039

Added web service for controlling Scrapy process

This also deprecates the web console.

r2053 | #167

Support for running Scrapy as a service, for production systems

r1988, r2054, r2055, r2056, r2057 | #168

Added wrapper induction library

Added new wrapper induction library (documentation only available in source code for now).

r2011

Several improvements to response encoding support

Simplified and improved response encoding support.

r1961, r1969

Added LOG_ENCODING setting

r1956 | http://doc.scrapy.org/0.9/topics/settings.html#setting-LOG_ENCODING

Added RANDOMIZE_DOWNLOAD_DELAY setting

Which is also enabled by default.

r1923 | http://doc.scrapy.org/0.9/topics/settings.html#setting-RANDOMIZE_DOWNLOAD_DELAY

MailSender is no longer IO-blocking

r1955 | #146

Linkextractors and new Crawlspider now handle relative base tag urls

r1960 | #148

Several improvements to Item Loaders and processors

Thanks to Ping Yin for contributing them!

r2022, r2023, r2024, r2025, r2026, r2027, r2028, r2029, r2030

Added support for adding variables to telnet console

r2047 | #165

Support for requests without callbacks

r2050 | #166

API changes

Change Spider.domain_name to Spider.name (SEP-012)

Old attributes were deprecated.

r1975

Response.encoding is now the detected encoding

It used to be the original encoding declared in HTTP headers or HTML meta tag. These can now be accessed through the Response._declared_encoding() protected method.

r1961

HttpErrorMiddleware now returns None or raises an exception

returning an iterable is not allowed anymore

r2006 | #157

scrapy.command modules relocation

The goal of these changes is to flatten the module hierarchy.

  • scrapy.command.cmdline moved to scrapy.cmdline
  • scrapy.command.models moved to scrapy.command
  • scrapy.command.commands moved to scrapy.commands

r2035, r2036, r2037

Added ExecutionQueue for feeding spiders to scrape

r2034

Removed ExecutionEngine singleton

scrapy.core.engine.scrapyengine no longer exists, use scrapy.core.manager.scrapymanager.engine to access the engine.

r2039

Ported S3ImagesStore (images pipeline) to use boto and threads

r2033

Moved module: {{scrapy.management.telnet}} to {{scrapy.telnet}}

r2047

Changes to default settings

Changed default SCHEDULER_ORDER to 'DFO'

To minimize memory issues, since DFO typically consumes less memory.

r1939