Scrapy wiki and bug-report system
Welcome!. This is the wiki of Scrapy, an open source web crawling and screen scraping framework for Python.
Due to spam protection measures, you need to register to report bugs or modify the content of this wiki. Registration is very quick (just enter username and password). Please register and contribute to the Scrapy community!
If you're new to Scrapy, start by reading Scrapy at a glance. The information on this wiki should be considered complementary to the official documentation.
Guides and HOWTOs for Scrapy users
- Scraping AJAX sites
- Using Parsley - Using Parsley and Parselets with Scrapy
- Deploying a Scrapy crawler on Amazon EC2
- PyQt4 and scrapy connect - Using PyQt4 as frontend for scrapy applications
- Run Scrapy crawler in a thread - to prevent blocking, so it can be used from scripts or other software
Other User Resources
- Companies using Scrapy - list of companies and projects using Scrapy
- Community Spiders - Spiders for different sites contributed by the community (useful examples)
- Scrapy Recipes - Some code snippets for non-trivial tasks
- Official APT repositories - For deploying Scrapy in Ubuntu
- Scrapy 0.10 Changes - Comprehensive list of changes in Scrapy 0.10
- Scrapy 0.9 Changes - Comprehensive list of changes in Scrapy 0.9
- Scrapy 0.8 Changes - Comprehensive list of changes in Scrapy 0.8
Developer Resources
Project tracking and source code
- Trac Code Browser: Browse the source code using Trac web interface
- Project Timeline: View (and keep track of) recent changes to code, wiki and tickets
- Mercurial Web Interface: Browse the source code and changesets on the official Mercurial repo
- Scrapy release procedure
Scrapy Enhancement Proposals
Major framework enhancements are typically discussed and proposed as a Scrapy Enhacement Proposal (SEP). Here's a list of all SEPs written so far, grouped by their status:
- Archived / Implemented:
- Active / Draft / In Progress:
- Obsolete:
Getting the code
Scrapy uses Mercurial (hg) for managing its code.
Assuming you have Mercurial installed, the following command in a terminal will fetch the most recent code for you:
hg clone http://hg.scrapy.org/scrapy
Or you can also clone/fork:
- the Github mirror: http://github.com/insophia/scrapy/
- the Bitbucket mirror: http://bitbucket.org/insophia/scrapy/
Getting involved
If you want to get involved, start by reading Contributing to Scrapy.
Feel free to check out the code and start playing with it (you can also fork Scrapy on bitbucket), and take a look at the community resources if you want to talk with other Scrapy developers. We welcome any kind of feedback and discussions on Scrapy improvements.
