Ticket #138 (closed defect: fixed)

Opened 5 months ago

Last modified 5 months ago

invalid header encoding crash scrapy

Reported by: darkrho Owned by: pablo
Priority: minor Milestone: 0.9
Component: code Version: 0.8
Keywords: response encoding Cc: daniel pablo

Description

Scrapy uses http header encoding without verify that is valid encoding:

project>python scrapy-ctl.py shell http://www.gsmarena.com/nokia-phones-1.php
[...]
2010-02-23 18:24:18-0500 [-] Unhandled Error
        Traceback (most recent call last):
        Failure: scrapy.core.exceptions.IgnoreRequest: unknown encoding: None

http header:

$ curl -I http://www.gsmarena.com/nokia-phones-1.php
HTTP/1.1 200 OK
Date: Wed, 24 Feb 2010 01:27:51 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.1.6
Connection: close
Content-Type: text/html; charset=None

Scrapy set "None" as response.encoding

The attached patch adds encoding lookup test to the method ResponseText?.headers_encoding()

The body_encoding() method does not require aditional check because UnicodeDammit? already lookup the possible encoding.

Attachments

test_invalid_header_encoding.patch (1.3 kB) - added by darkrho 5 months ago.
tests
lookup_header_encoding.patch (0.9 kB) - added by darkrho 5 months ago.
fix headers_encoding()

Change History

Changed 5 months ago by darkrho

tests

Changed 5 months ago by darkrho

fix headers_encoding()

Changed 5 months ago by pablo

  • status changed from new to closed
  • resolution set to fixed

Fixed in r1936.

Changed 5 months ago by pablo

  • milestone set to 0.9
Note: See TracTickets for help on using tickets.