Ticket #138 (closed defect: fixed)
invalid header encoding crash scrapy
| Reported by: | darkrho | Owned by: | pablo |
|---|---|---|---|
| Priority: | minor | Milestone: | 0.9 |
| Component: | code | Version: | 0.8 |
| Keywords: | response encoding | Cc: | daniel pablo |
Description
Scrapy uses http header encoding without verify that is valid encoding:
project>python scrapy-ctl.py shell http://www.gsmarena.com/nokia-phones-1.php
[...]
2010-02-23 18:24:18-0500 [-] Unhandled Error
Traceback (most recent call last):
Failure: scrapy.core.exceptions.IgnoreRequest: unknown encoding: None
http header:
$ curl -I http://www.gsmarena.com/nokia-phones-1.php HTTP/1.1 200 OK Date: Wed, 24 Feb 2010 01:27:51 GMT Server: Apache/2.2.3 (CentOS) X-Powered-By: PHP/5.1.6 Connection: close Content-Type: text/html; charset=None
Scrapy set "None" as response.encoding
The attached patch adds encoding lookup test to the method ResponseText?.headers_encoding()
The body_encoding() method does not require aditional check because UnicodeDammit? already lookup the possible encoding.
Attachments
Change History
Note: See
TracTickets for help on using
tickets.
