Ticket #106 (closed enhancement: wontfix)

Opened 12 months ago

Last modified 2 weeks ago

Merge parse and crawl commands

Reported by: pablo Owned by: pablo
Priority: major Milestone: 0.10
Component: code Version:
Keywords: Cc: dan pablo

Description (last modified by pablo) (diff)

There is a somewhat hacky parse command to fetch specific pages, parse them and show the results.

The problem with this parse command is that it runs outside the crawl loop, and thus it could return different results. For example, if a spider returns a particular requests, it won't follow then (when using the parse command).

We need to "integrate" better the parse command inside the engine crawling loop to provide results more similar to those that would be obtained by crawling with the crawl command.

Change History

Changed 10 months ago by pablo

  • type changed from defect to enhancement

Changed 10 months ago by pablo

  • milestone changed from 0.8 to 0.9

Changed 3 months ago by pablo

  • milestone changed from 0.9 to 0.10

This will be probably done with the refactoring of the spider middleware that is planned for 0.10.

Changed 2 months ago by anibal

see also #173

Changed 3 weeks ago by pablo

  • description modified (diff)

Changed 2 weeks ago by pablo

  • status changed from new to closed
  • resolution set to wontfix

We won't be merging crawl and parse for now. After the parse command refactoring introduced in r2196, it should be easier to maintain it.

Note: See TracTickets for help on using tickets.