SEP-008 - Item Loaders
Contents
| SEP: | 8 |
| Title: | Item Parsers |
| Author: | Pablo Hoffman |
| Created: | 2009-08-11 |
| Status | Final (implemented with variations) |
| Obsoletes | SEP-001, SEP-002, SEP-003, SEP-005 |
Introduction
Item Parser is the final API proposed to implement Item Builders/Loader proposed in SEP-001.
NOTE: This is the API that was finally implemented with the name "Item Loaders", instead of "Item Parsers" along with some other minor fine tuning to the API methods and semantics.
Dataflow
- ItemParser.add_value()
- input_parser
- store
- ItemParser.add_xpath() (only available in XPathItemLoader)
- selector.extract()
- input_parser
- store
- ItemParser.populate_item() (ex. get_item)
- output_parser
- assign field
Modules and classes
- scrapy.contrib.itemparser.ItemParser
- scrapy.contrib.itemparser.XPathItemParser
- scrapy.contrib.itemparser.parsers.MapConcat (ex. TreeExpander)
- scrapy.contrib.itemparser.parsers.TakeFirst
- scrapy.contrib.itemparser.parsers.Join
- scrapy.contrib.itemparser.parsers.Identity
Public API
- ItemParser.add_value()
- ItemParser.replace_value()
- ItemParser.populate_item() (returns item populated)
- ItemParser.get_collected_values() (note the 's' in values)
- ItemParser.parse_field()
- ItemParser.get_input_parser()
- ItemParser.get_output_parser()
- ItemParser.context
- ItemParser.default_item_class
- ItemParser.default_input_parser
- ItemParser.default_output_parser
- ItemParser.field_in
- ItemParser.field_out
Alternative Public API Proposal
- ItemLoader.add_value()
- ItemLoader.replace_value()
- ItemLoader.load_item() (returns loaded item)
- ItemLoader.get_stored_values() or ItemLoader.get_values() (returns the ItemLoader values)
- ItemLoader.get_output_value()
- ItemLoader.get_input_processor() or ItemLoader.get_in_processor() (short version)
- ItemLoader.get_output_processor() or ItemLoader.get_out_processor() (short version)
- ItemLoader.context
- ItemLoader.default_item_class
- ItemLoader.default_input_processor or ItemLoader.default_in_processor (short version)
- ItemLoader.default_output_processor or ItemLoader.default_out_processor (short version)
- ItemLoader.field_in
- ItemLoader.field_out
Usage example: declaring Item Parsers
from scrapy.contrib.itemparser import XPathItemParser, parsers class ProductParser(XPathItemParser): name_in = parsers.MapConcat(removetags, filterx) price_in = parsers.MapConcat(...) price_out = parsers.TakeFirst()
Usage example: declaring parsers in Fields
class Product(Item): name = Field(output_parser=parsers.Join(), ...) price = Field(output_parser=parsers.TakeFirst(), ...) description = Field(input_parser=parsers.MapConcat(removetags))
