Ticket #133 (assigned defect)

Opened 7 months ago

Last modified 3 weeks ago

Patch for def canonicalize_url

Reported by: sdogi Owned by: daniel
Priority: major Milestone:
Component: code Version: 0.8
Keywords: Cc: daniel pablo

Description

Current behavior of Scrapy when finding links like:
/fclick.php?variable

is to canonicalize them to:
/fclick.php?variable=

This however makes Scrapy follow an incorrect link and cause an error page to load. This is really fault of web script programmers really who use variables without value. But for the sake of robustness Scrapy should follow the correct links.

I made a small patch for this. All it does really is that when it faces variables with 0 length value it crops out the =.

Attachments

url.py (5.5 kB) - added by sdogi 7 months ago.
Patched url.py

Change History

Changed 7 months ago by sdogi

Patched url.py

Changed 7 months ago by sdogi

Diff:

154a155,157

for pair in keyvals:
if len(pair[1]) == 0:
query = query.replace(pair[0] + "=", pair[0])

Changed 7 months ago by sdogi

Just in case there are more url.py 's in the package. The one I'm talking about is:
scrapy/utils/url.py

Changed 4 months ago by pablo

  • owner changed from pablo to daniel
  • status changed from new to assigned

Changed 3 months ago by pablo

  • milestone changed from 0.9 to 0.10

Changed 3 weeks ago by daniel

  • milestone deleted
Note: See TracTickets for help on using tickets.