Changeset 1849:1d0ac164cf62

Show
Ignore:
Timestamp:
11/14/09 20:28:59 (9 months ago)
Author:
Pablo Hoffman <pablo@…>
Branch:
default
Message:

StatsCollector?: ported methods to receive spider instances (closes #113), removed list_domains() method, added iter_spider_stats() method

Files:
18 modified

Legend:

Unmodified
Added
Removed
  • docs/topics/stats.rst

    r1822 r1849  
    99 
    1010Scrapy provides a convenient service for collecting stats in the form of 
    11 key/values, both globally and per spider/domain. It's called the Stats 
    12 Collector, and it's a singleton which can be imported and used quickly, as 
    13 illustrated by the examples in the :ref:`topics-stats-usecases` section below. 
     11key/values, both globally and per spider. It's called the Stats Collector, and 
     12it's a singleton which can be imported and used quickly, as illustrated by the 
     13examples in the :ref:`topics-stats-usecases` section below. 
    1414 
    1515The stats collection is enabled by default but can be disabled through the 
     
    2727enabled) and extremely efficient (almost unnoticeable) when disabled. 
    2828 
    29 The Stats Collector keeps one stats table per open spider/domain and one global 
    30 stats table. You can't set or get stats from a closed domain, but the 
    31 domain-specific stats table is automatically opened when the spider is opened, 
    32 and closed when the spider is closed. 
     29The Stats Collector keeps one stats table per open spider and one global stats 
     30table. You can't set or get stats from a closed spider, but the spider-specific 
     31stats table is automatically opened when the spider is opened, and closed when 
     32the spider is closed. 
    3333 
    3434.. _topics-stats-usecases: 
     
    6262    8 
    6363 
    64 Get all global stats from a given domain:: 
     64Get all global stats (ie. not particular to any spider):: 
    6565 
    6666    >>> stats.get_stats() 
    6767    {'hostname': 'localhost', 'spiders_crawled': 8} 
    6868 
    69 Set domain/spider specific stat value (domains must be opened first, but this 
     69Set spider specific stat value (spider stats must be opened first, but this 
    7070task is handled automatically by the Scrapy engine):: 
    7171 
    72     stats.set_value('start_time', datetime.now(), domain='example.com') 
    73  
    74 Increment domain-specific stat value:: 
    75  
    76     stats.inc_value('pages_crawled', domain='example.com') 
    77  
    78 Set domain-specific stat value only if greater than previous:: 
    79  
    80     stats.max_value('max_items_scraped', value, domain='example.com') 
    81  
    82 Set domain-specific stat value only if lower than previous:: 
    83  
    84     stats.min_value('min_free_memory_percent', value, domain='example.com') 
    85  
    86 Get domain-specific stat value:: 
    87  
    88     >>> stats.get_value('pages_crawled', domain='example.com') 
     72    stats.set_value('start_time', datetime.now(), spider=some_spider) 
     73 
     74Where ``some_spider`` is a :class:`~scrapy.spider.BaseSpider` object. 
     75 
     76Increment spider-specific stat value:: 
     77 
     78    stats.inc_value('pages_crawled', spider=some_spider) 
     79 
     80Set spider-specific stat value only if greater than previous:: 
     81 
     82    stats.max_value('max_items_scraped', value, spider=some_spider) 
     83 
     84Set spider-specific stat value only if lower than previous:: 
     85 
     86    stats.min_value('min_free_memory_percent', value, spider=some_spider) 
     87 
     88Get spider-specific stat value:: 
     89 
     90    >>> stats.get_value('pages_crawled', spider=some_spider) 
    8991    1238 
    9092 
    91 Get all stats from a given domain:: 
    92  
    93     >>> stats.get_stats('pages_crawled', domain='example.com') 
     93Get all stats from a given spider:: 
     94 
     95    >>> stats.get_stats('pages_crawled', spider=some_spider) 
    9496    {'pages_crawled': 1238, 'start_time': datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)} 
    9597 
     
    109111.. class:: StatsCollector 
    110112     
    111     .. method:: get_value(key, default=None, domain=None) 
     113    .. method:: get_value(key, default=None, spider=None) 
    112114  
    113115        Return the value for the given stats key or default if it doesn't exist. 
    114         If domain is ``None`` the global stats table is consulted, other the 
    115         domain specific one is. If the domain is not yet opened a ``KeyError`` 
     116        If spider is ``None`` the global stats table is consulted, otherwise the 
     117        spider specific one is. If the spider is not yet opened a ``KeyError`` 
    116118        exception is raised. 
    117119 
    118     .. method:: get_stats(domain=None) 
    119  
    120         Get all stats from the given domain/spider (if domain is given) or all 
    121         global stats otherwise, as a dict. If domain is not opened ``KeyError`` 
    122         is raied. 
    123  
    124     .. method:: set_value(key, value, domain=None) 
     120    .. method:: get_stats(spider=None) 
     121 
     122        Get all stats from the given spider (if spider is given) or all global 
     123        stats otherwise, as a dict. If spider is not opened ``KeyError`` is 
     124        raied. 
     125 
     126    .. method:: set_value(key, value, spider=None) 
    125127 
    126128        Set the given value for the given stats key on the global stats (if 
    127         domain is not given) or the domain-specific stats (if domain is given), 
     129        spider is not given) or the spider-specific stats (if spider is given), 
    128130        which must be opened or a ``KeyError`` will be raised. 
    129131 
    130     .. method:: set_stats(stats, domain=None) 
    131  
    132         Set the given stats (as a dict) for the given domain. If the domain is 
     132    .. method:: set_stats(stats, spider=None) 
     133 
     134        Set the given stats (as a dict) for the given spider. If the spider is 
    133135        not opened a ``KeyError`` will be raised. 
    134136 
    135     .. method:: inc_value(key, count=1, start=0, domain=None) 
     137    .. method:: inc_value(key, count=1, start=0, spider=None) 
    136138 
    137139        Increment the value of the given stats key, by the given count, 
    138         assuming the start value given (when it's not set). If domain is not 
    139         given the global stats table is used, otherwise the domain-specific 
     140        assuming the start value given (when it's not set). If spider is not 
     141        given the global stats table is used, otherwise the spider-specific 
    140142        stats table is used, which must be opened or a ``KeyError`` will be 
    141143        raised. 
    142144 
    143     .. method:: max_value(key, value, domain=None) 
     145    .. method:: max_value(key, value, spider=None) 
    144146 
    145147        Set the given value for the given key only if current value for the 
    146148        same key is lower than value. If there is no current value for the 
    147         given key, the value is always set. If domain is not given the global 
    148         stats table is used, otherwise the domain-specific stats table is used, 
     149        given key, the value is always set. If spider is not given the global 
     150        stats table is used, otherwise the spider-specific stats table is used, 
    149151        which must be opened or a KeyError will be raised. 
    150152 
    151     .. method:: min_value(key, value, domain=None) 
     153    .. method:: min_value(key, value, spider=None) 
    152154 
    153155        Set the given value for the given key only if current value for the 
    154156        same key is greater than value. If there is no current value for the 
    155         given key, the value is always set. If domain is not given the global 
    156         stats table is used, otherwise the domain-specific stats table is used, 
     157        given key, the value is always set. If spider is not given the global 
     158        stats table is used, otherwise the spider-specific stats table is used, 
    157159        which must be opened or a KeyError will be raised. 
    158160 
    159     .. method:: clear_stats(domain=None) 
    160  
    161         Clear all global stats (if domain is not given) or all domain-specific 
    162         stats if domain is given, in which case it must be opened or a 
     161    .. method:: clear_stats(spider=None) 
     162 
     163        Clear all global stats (if spider is not given) or all spider-specific 
     164        stats if spider is given, in which case it must be opened or a 
    163165        ``KeyError`` will be raised. 
    164166 
    165     .. method:: list_domains() 
    166  
    167         Return a list of all opened domains. 
    168  
    169     .. method:: open_domain(domain) 
    170  
    171         Open the given domain for stats collection. This method must be called 
    172         prior to working with any stats specific to that domain, but this task 
     167    .. method:: iter_spider_stats() 
     168 
     169        Return a iterator over ``(spider, spider_stats)`` for each open spider 
     170        currently tracked by the stats collector, where ``spider_stats`` is the 
     171        dict containing all spider-specific stats. 
     172 
     173        Global stats are not included in the iterator. If you want to get 
     174        those, use :meth:`get_stats` method. 
     175 
     176    .. method:: open_spider(spider) 
     177 
     178        Open the given spider for stats collection. This method must be called 
     179        prior to working with any stats specific to that spider, but this task 
    173180        is handled automatically by the Scrapy engine. 
    174181 
    175     .. method:: close_domain(domain) 
    176  
    177         Close the given domain. After this is called, no more specific stats 
    178         for this domain can be accessed. This method is called automatically on 
     182    .. method:: close_spider(spider) 
     183 
     184        Close the given spider. After this is called, no more specific stats 
     185        for this spider can be accessed. This method is called automatically on 
    179186        the :signal:`spider_closed` signal. 
    180187 
     
    197204 
    198205    A simple stats collector that keeps the stats of the last scraping run (for 
    199     each domain) in memory, which can be accessed through the ``domain_stats`` 
    200     attribute 
     206    each spider) in memory, after they're closed. The stats can be accessed 
     207    through the :attr:`domain_stats` attribute, which is a dict keyed by spider 
     208    domain name. 
    201209 
    202210    This is the default Stats Collector used in Scrapy. 
     
    204212    .. attribute:: domain_stats 
    205213 
    206        A dict of dicts (keyed by domain) containing the stats of the last 
    207        scraping run for each domain. 
     214       A dict of dicts (keyed by spider domain name) containing the stats of 
     215       the last scraping run for each domain. 
    208216 
    209217DummyStatsCollector 
     
    284292   :synopsis: Stats Collector signals 
    285293 
    286 .. signal:: stats_domain_opened 
    287 .. function:: stats_domain_opened(domain) 
    288  
    289     Sent right after the stats domain is opened. You can use this signal to add 
    290     startup stats for domain (example: start time). 
    291  
    292     :param domain: the stats domain just opened 
    293     :type domain: str 
    294  
    295 .. signal:: stats_domain_closing 
    296 .. function:: stats_domain_closing(domain, reason) 
    297  
    298     Sent just before the stats domain is closed. You can use this signal to add 
     294.. signal:: stats_spider_opened 
     295.. function:: stats_spider_opened(spider) 
     296 
     297    Sent right after the stats spider is opened. You can use this signal to add 
     298    startup stats for spider (example: start time). 
     299 
     300    :param spider: the stats spider just opened 
     301    :type spider: str 
     302 
     303.. signal:: stats_spider_closing 
     304.. function:: stats_spider_closing(spider, reason) 
     305 
     306    Sent just before the stats spider is closed. You can use this signal to add 
    299307    some closing stats (example: finish time). 
    300308 
    301     :param domain: the stats domain about to be closed 
    302     :type domain: str 
    303  
    304     :param reason: the reason why the domain is being closed. See 
     309    :param spider: the stats spider about to be closed 
     310    :type spider: str 
     311 
     312    :param reason: the reason why the spider is being closed. See 
    305313        :signal:`spider_closed` signal for more info. 
    306314    :type reason: str 
    307315 
    308 .. signal:: stats_domain_closed 
    309 .. function:: stats_domain_closed(domain, reason, domain_stats) 
    310  
    311     Sent right after the stats domain is closed. You can use this signal to 
    312     collect resources, but not to add any more stats as the stats domain has 
    313     already been close (use :signal:`stats_domain_closing` for that instead). 
    314  
    315     :param domain: the stats domain just closed 
    316     :type domain: str 
    317  
    318     :param reason: the reason why the domain was closed. See 
     316.. signal:: stats_spider_closed 
     317.. function:: stats_spider_closed(spider, reason, spider_stats) 
     318 
     319    Sent right after the stats spider is closed. You can use this signal to 
     320    collect resources, but not to add any more stats as the stats spider has 
     321    already been close (use :signal:`stats_spider_closing` for that instead). 
     322 
     323    :param spider: the stats spider just closed 
     324    :type spider: str 
     325 
     326    :param reason: the reason why the spider was closed. See 
    319327        :signal:`spider_closed` signal for more info. 
    320328    :type reason: str 
    321329 
    322     :param domain_stats: the stats of the domain just closed. 
     330    :param spider_stats: the stats of the spider just closed. 
    323331    :type reason: dict 
  • scrapy/contrib/corestats.py

    r1645 r1849  
    1111from scrapy.core import signals 
    1212from scrapy.stats import stats 
    13 from scrapy.stats.signals import stats_domain_opened, stats_domain_closing 
     13from scrapy.stats.signals import stats_spider_opened, stats_spider_closing 
    1414from scrapy.conf import settings 
    1515 
     
    2323        stats.set_value('envinfo/pid', os.getpid()) 
    2424 
    25         dispatcher.connect(self.stats_domain_opened, signal=stats_domain_opened) 
    26         dispatcher.connect(self.stats_domain_closing, signal=stats_domain_closing) 
     25        dispatcher.connect(self.stats_spider_opened, signal=stats_spider_opened) 
     26        dispatcher.connect(self.stats_spider_closing, signal=stats_spider_closing) 
    2727        dispatcher.connect(self.item_scraped, signal=signals.item_scraped) 
    2828        dispatcher.connect(self.item_passed, signal=signals.item_passed) 
    2929        dispatcher.connect(self.item_dropped, signal=signals.item_dropped) 
    3030 
    31     def stats_domain_opened(self, domain): 
    32         stats.set_value('start_time', datetime.datetime.utcnow(), domain=domain) 
    33         stats.set_value('envinfo/host', stats.get_value('envinfo/host'), domain=domain) 
    34         stats.inc_value('domain_count/opened') 
     31    def stats_spider_opened(self, spider): 
     32        stats.set_value('start_time', datetime.datetime.utcnow(), spider=spider) 
     33        stats.set_value('envinfo/host', stats.get_value('envinfo/host'), spider=spider) 
     34        stats.inc_value('spider_count/opened') 
    3535 
    36     def stats_domain_closing(self, domain, reason): 
    37         stats.set_value('finish_time', datetime.datetime.utcnow(), domain=domain) 
    38         stats.set_value('finish_status', 'OK' if reason == 'finished' else reason, domain=domain) 
    39         stats.inc_value('domain_count/%s' % reason, domain=domain) 
     36    def stats_spider_closing(self, spider, reason): 
     37        stats.set_value('finish_time', datetime.datetime.utcnow(), spider=spider) 
     38        stats.set_value('finish_status', 'OK' if reason == 'finished' else reason, spider=spider) 
     39        stats.inc_value('spider_count/%s' % reason, spider=spider) 
    4040 
    4141    def item_scraped(self, item, spider): 
    42         stats.inc_value('item_scraped_count', domain=spider.domain_name) 
     42        stats.inc_value('item_scraped_count', spider=spider) 
    4343        stats.inc_value('item_scraped_count') 
    4444 
    4545    def item_passed(self, item, spider): 
    46         stats.inc_value('item_passed_count', domain=spider.domain_name) 
     46        stats.inc_value('item_passed_count', spider=spider) 
    4747        stats.inc_value('item_passed_count') 
    4848 
    4949    def item_dropped(self, item, spider, exception): 
    5050        reason = exception.__class__.__name__ 
    51         stats.inc_value('item_dropped_count', domain=spider.domain_name) 
    52         stats.inc_value('item_dropped_reasons_count/%s' % reason, domain=spider.domain_name) 
     51        stats.inc_value('item_dropped_count', spider=spider) 
     52        stats.inc_value('item_dropped_reasons_count/%s' % reason, spider=spider) 
    5353        stats.inc_value('item_dropped_count') 
  • scrapy/contrib/downloadermiddleware/stats.py

    r1345 r1849  
    66 
    77class DownloaderStats(object): 
    8     """DownloaderStats store stats of all requests, responses and 
    9     exceptions that pass through it. 
    10  
    11     To use this middleware you must enable the DOWNLOADER_STATS setting. 
    12     """ 
    138 
    149    def __init__(self): 
     
    1712 
    1813    def process_request(self, request, spider): 
    19         domain = spider.domain_name 
    2014        stats.inc_value('downloader/request_count') 
    21         stats.inc_value('downloader/request_count', domain=domain) 
    22         stats.inc_value('downloader/request_method_count/%s' % request.method, domain=domain) 
     15        stats.inc_value('downloader/request_count', spider=spider) 
     16        stats.inc_value('downloader/request_method_count/%s' % request.method, spider=spider) 
    2317        reqlen = len(request_httprepr(request)) 
    24         stats.inc_value('downloader/request_bytes', reqlen, domain=domain) 
     18        stats.inc_value('downloader/request_bytes', reqlen, spider=spider) 
    2519        stats.inc_value('downloader/request_bytes', reqlen) 
    2620 
    2721    def process_response(self, request, response, spider): 
    28         domain = spider.domain_name 
    2922        stats.inc_value('downloader/response_count') 
    30         stats.inc_value('downloader/response_count', domain=domain) 
    31         stats.inc_value('downloader/response_status_count/%s' % response.status, domain=domain) 
     23        stats.inc_value('downloader/response_count', spider=spider) 
     24        stats.inc_value('downloader/response_status_count/%s' % response.status, spider=spider) 
    3225        reslen = len(response_httprepr(response)) 
    33         stats.inc_value('downloader/response_bytes', reslen, domain=domain) 
     26        stats.inc_value('downloader/response_bytes', reslen, spider=spider) 
    3427        stats.inc_value('downloader/response_bytes', reslen) 
    3528        return response 
     
    3831        ex_class = "%s.%s" % (exception.__class__.__module__, exception.__class__.__name__) 
    3932        stats.inc_value('downloader/exception_count') 
    40         stats.inc_value('downloader/exception_count', domain=spider.domain_name) 
    41         stats.inc_value('downloader/exception_type_count/%s' % ex_class, domain=spider.domain_name) 
     33        stats.inc_value('downloader/exception_count', spider=spider) 
     34        stats.inc_value('downloader/exception_type_count/%s' % ex_class, spider=spider) 
  • scrapy/contrib/itemsampler.py

    r1828 r1849  
    5454 
    5555    def process_item(self, spider, item): 
    56         sampled = stats.get_value("items_sampled", 0, domain=spider.domain_name) 
     56        sampled = stats.get_value("items_sampled", 0, spider=spider) 
    5757        if sampled < items_per_spider: 
    5858            self.items[item.guid] = item 
    5959            sampled += 1 
    60             stats.set_value("items_sampled", sampled, domain=spider.domain_name) 
     60            stats.set_value("items_sampled", sampled, spider=spider) 
    6161            log.msg("Sampled %s" % item, spider=spider, level=log.INFO) 
    6262            if close_spider and sampled == items_per_spider: 
     
    7272 
    7373    def spider_closed(self, spider, reason): 
    74         if reason == 'finished' and not stats.get_value("items_sampled", domain=spider.domain_name): 
     74        if reason == 'finished' and not stats.get_value("items_sampled", spider=spider): 
    7575            self.empty_domains.add(spider.domain_name) 
    7676        self.spiders_count += 1 
     
    8888 
    8989    def process_spider_input(self, response, spider): 
    90         if stats.get_value("items_sampled", domain=spider.domain_name) >= items_per_spider: 
     90        if stats.get_value("items_sampled", spider=spider) >= items_per_spider: 
    9191            return [] 
    9292        elif max_response_size and max_response_size > len(response_httprepr(response)):   
     
    101101                items.append(r) 
    102102 
    103         if stats.get_value("items_sampled", domain=spider.domain_name) >= items_per_spider: 
     103        if stats.get_value("items_sampled", spider=spider) >= items_per_spider: 
    104104            return [] 
    105105        else: 
  • scrapy/contrib/pipeline/images.py

    r1829 r1849  
    220220                (status, request, referer) 
    221221        log.msg(msg, level=log.DEBUG, spider=info.spider) 
    222         self.inc_stats(info.spider.domain_name, status) 
     222        self.inc_stats(info.spider, status) 
    223223 
    224224        try: 
     
    259259            log.msg('Image (uptodate): Downloaded %s from <%s> referred in <%s>' % \ 
    260260                    (self.MEDIA_NAME, request.url, referer), level=log.DEBUG, spider=info.spider) 
    261             self.inc_stats(info.spider.domain_name, 'uptodate') 
     261            self.inc_stats(info.spider, 'uptodate') 
    262262 
    263263            checksum = result.get('checksum', None) 
     
    296296            yield thumb_key, thumb_image, thumb_buf 
    297297 
    298     def inc_stats(self, domain, status): 
    299         stats.inc_value('image_count', domain=domain) 
    300         stats.inc_value('image_status_count/%s' % status, domain=domain) 
     298    def inc_stats(self, spider, status): 
     299        stats.inc_value('image_count', spider=spider) 
     300        stats.inc_value('image_status_count/%s' % status, spider=spider) 
    301301 
    302302    def convert_image(self, image, size=None): 
  • scrapy/contrib/spidermiddleware/depth.py

    r1835 r1849  
    1919 
    2020    def process_spider_output(self, response, result, spider): 
    21         domain = spider.domain_name 
    2221        def _filter(request): 
    2322            if isinstance(request, Request): 
     
    2928                    return False 
    3029                elif self.stats: 
    31                     stats.inc_value('request_depth_count/%s' % depth, domain=domain) 
    32                     if depth > stats.get_value('request_depth_max', 0, domain=domain): 
    33                         stats.set_value('request_depth_max', depth, domain=domain) 
     30                    stats.inc_value('request_depth_count/%s' % depth, spider=spider) 
     31                    if depth > stats.get_value('request_depth_max', 0, spider=spider): 
     32                        stats.set_value('request_depth_max', depth, spider=spider) 
    3433            return True 
    3534 
     
    3736        if self.stats and 'depth' not in response.request.meta:  
    3837            response.request.meta['depth'] = 0 
    39             stats.inc_value('request_depth_count/0', domain=domain) 
     38            stats.inc_value('request_depth_count/0', spider=spider) 
    4039 
    4140        return (r for r in result or () if _filter(r)) 
  • scrapy/contrib/statsmailer.py

    r1364 r1849  
    11""" 
    2 StatsMailer extension sends an email when a domain finishes scraping. 
     2StatsMailer extension sends an email when a spider finishes scraping. 
    33 
    44Use STATSMAILER_RCPTS setting to enable and give the recipient mail address 
     
    1818        if not self.recipients: 
    1919            raise NotConfigured 
    20         dispatcher.connect(self.stats_domain_closed, signal=signals.stats_domain_closed) 
     20        dispatcher.connect(self.stats_spider_closed, signal=signals.stats_spider_closed) 
    2121         
    22     def stats_domain_closed(self, domain, domain_stats): 
     22    def stats_spider_closed(self, spider, spider_stats): 
    2323        mail = MailSender() 
    2424        body = "Global stats\n\n" 
    2525        body += "\n".join("%-50s : %s" % i for i in stats.get_stats().items()) 
    26         body += "\n\n%s stats\n\n" % domain 
    27         body += "\n".join("%-50s : %s" % i for i in domain_stats.items()) 
    28         mail.send(self.recipients, "Scrapy stats for: %s" % domain, body) 
     26        body += "\n\n%s stats\n\n" % spider.domain_name 
     27        body += "\n".join("%-50s : %s" % i for i in spider_stats.items()) 
     28        mail.send(self.recipients, "Scrapy stats for: %s" % spider.domain_name, body) 
  • scrapy/contrib/webconsole/stats.py

    r1358 r1849  
    2323        s += "<h3>Global stats</h3>\n" 
    2424        s += stats_html_table(stats.get_stats()) 
    25         for domain in stats.list_domains(): 
    26             s += "<h3>%s</h3>\n" % domain 
    27             s += stats_html_table(stats.get_stats(domain)) 
     25        for spider, spider_stats in stats.iter_spider_stats(): 
     26            s += "<h3>%s</h3>\n" % spider.domain_name 
     27            s += stats_html_table(spider_stats) 
    2828        s += "</body>\n" 
    2929        s += "</html>\n" 
  • scrapy/contrib_exp/spiderprofiler.py

    r1641 r1849  
    4646            mafter = self._memusage() 
    4747            ct = time() - tbefore 
    48             domain = spider.domain_name 
    49             tcc = stats.get_value('profiling/total_callback_time', 0, domain=domain) 
    50             sct = stats.get_value('profiling/slowest_callback_time', 0, domain=domain) 
    51             stats.set_value('profiling/total_callback_time' % spider.domain_name, \ 
    52                 tcc+ct, domain=domain) 
     48            tcc = stats.get_value('profiling/total_callback_time', 0, spider=spider) 
     49            sct = stats.get_value('profiling/slowest_callback_time', 0, spider=spider) 
     50            stats.set_value('profiling/total_callback_time', tcc+ct, spider=spider) 
    5351            if ct > sct: 
    54                 stats.set_value('profiling/slowest_callback_time', ct, domain=domain) 
     52                stats.set_value('profiling/slowest_callback_time', ct, spider=spider) 
    5553                stats.set_value('profiling/slowest_callback_name', function.__name__, \ 
    56                     domain=domain) 
     54                    spider=spider) 
    5755                stats.set_value('profiling/slowest_callback_url', args[0].url, \ 
    58                     domain=domain) 
     56                    spider=spider) 
    5957            if self._memusage: 
    6058                stats.inc_value('profiling/total_mem_allocated_in_callbacks', \ 
    61                     count=mafter-mbefore, domain=domain) 
     59                    count=mafter-mbefore, spider=spider) 
    6260            return r 
    6361        return new_callback 
  • scrapy/core/engine.py

    r1822 r1849  
    245245        self.downloader.open_spider(spider) 
    246246        self.scraper.open_spider(spider) 
    247         stats.open_domain(spider.domain_name) 
     247        stats.open_spider(spider) 
    248248 
    249249        send_catch_log(signals.spider_opened, sender=self.__class__, spider=spider) 
     
    306306        send_catch_log(signal=signals.spider_closed, sender=self.__class__, \ 
    307307            spider=spider, reason=reason) 
    308         stats.close_domain(spider.domain_name, reason=reason) 
     308        stats.close_spider(spider, reason=reason) 
    309309        dfd = defer.maybeDeferred(spiders.close_spider, spider) 
    310310        dfd.addBoth(log.msg, "Spider closed (%s)" % reason, spider=spider) 
  • scrapy/core/scraper.py

    r1835 r1849  
    8888        site = self.sites[spider] 
    8989        dfd = site.add_response_request(response, request) 
    90         # FIXME: this can't be called here because the stats domain may be 
     90        # FIXME: this can't be called here because the stats spider may be 
    9191        # already closed 
    9292        #stats.max_value('scraper/max_active_size', site.active_size, \ 
    93         #    domain=spider.domain_name) 
     93        #    spider=spider) 
    9494        def finish_scraping(_): 
    9595            site.finish_response(response) 
     
    9898        dfd.addBoth(finish_scraping) 
    9999        dfd.addErrback(log.err, 'Scraper bug processing %s' % request, \ 
    100             domain=spider.domain_name) 
     100            spider=spider) 
    101101        self._scrape_next(spider, site) 
    102102        return dfd 
     
    139139        log.msg(msg, log.ERROR, spider=spider) 
    140140        stats.inc_value("spider_exceptions/%s" % _failure.value.__class__.__name__, \ 
    141             domain=spider.domain_name) 
     141            spider=spider) 
    142142 
    143143    def handle_spider_output(self, result, request, response, spider): 
     
    153153        """ 
    154154        # TODO: keep closing state internally instead of checking engine 
    155         domain = spider.domain_name 
    156155        if spider in self.engine.closing: 
    157156            return 
     
    166165                item=output, spider=spider, response=response) 
    167166            self.sites[spider].itemproc_size += 1 
    168             # FIXME: this can't be called here because the stats domain may be 
     167            # FIXME: this can't be called here because the stats spider may be 
    169168            # already closed 
    170169            #stats.max_value('scraper/max_itemproc_size', \ 
    171             #        self.sites[domain].itemproc_size, domain=domain) 
     170            #        self.sites[spider].itemproc_size, spider=spider) 
    172171            dfd = self.itemproc.process_item(output, spider) 
    173172            dfd.addBoth(self._itemproc_finished, output, spider) 
     
    196195        """ItemProcessor finished for the given ``item`` and returned ``output`` 
    197196        """ 
    198         domain = spider.domain_name 
    199197        self.sites[spider].itemproc_size -= 1 
    200198        if isinstance(output, Failure): 
  • scrapy/stats/collector/__init__.py

    r1613 r1849  
    66from scrapy.xlib.pydispatch import dispatcher 
    77 
    8 from scrapy.stats.signals import stats_domain_opened, stats_domain_closing, \ 
    9     stats_domain_closed 
     8from scrapy.stats.signals import stats_spider_opened, stats_spider_closing, \ 
     9    stats_spider_closed 
    1010from scrapy.utils.signal import send_catch_log 
    1111from scrapy.core import signals 
     
    2020        dispatcher.connect(self._engine_stopped, signal=signals.engine_stopped) 
    2121 
    22     def get_value(self, key, default=None, domain=None): 
    23         return self._stats[domain].get(key, default) 
     22    def get_value(self, key, default=None, spider=None): 
     23        return self._stats[spider].get(key, default) 
    2424 
    25     def get_stats(self, domain=None): 
    26         return self._stats[domain] 
     25    def get_stats(self, spider=None): 
     26        return self._stats[spider] 
    2727 
    28     def set_value(self, key, value, domain=None): 
    29         self._stats[domain][key] = value 
     28    def set_value(self, key, value, spider=None): 
     29        self._stats[spider][key] = value 
    3030 
    31     def set_stats(self, stats, domain=None): 
    32         self._stats[domain] = stats 
     31    def set_stats(self, stats, spider=None): 
     32        self._stats[spider] = stats 
    3333 
    34     def inc_value(self, key, count=1, start=0, domain=None): 
    35         d = self._stats[domain] 
     34    def inc_value(self, key, count=1, start=0, spider=None): 
     35        d = self._stats[spider] 
    3636        d[key] = d.setdefault(key, start) + count 
    3737 
    38     def max_value(self, key, value, domain=None): 
    39         d = self._stats[domain] 
     38    def max_value(self, key, value, spider=None): 
     39        d = self._stats[spider] 
    4040        d[key] = max(d.setdefault(key, value), value) 
    4141 
    42     def min_value(self, key, value, domain=None): 
    43         d = self._stats[domain] 
     42    def min_value(self, key, value, spider=None): 
     43        d = self._stats[spider] 
    4444        d[key] = min(d.setdefault(key, value), value) 
    4545 
    46     def clear_stats(self, domain=None): 
    47         self._stats[domain].clear() 
     46    def clear_stats(self, spider=None): 
     47        self._stats[spider].clear() 
    4848 
    49     def list_domains(self): 
    50         return [d for d in self._stats.keys() if d is not None] 
     49    def iter_spider_stats(self): 
     50        return [x for x in self._stats.iteritems() if x[0]] 
    5151 
    52     def open_domain(self, domain): 
    53         self._stats[domain] = {} 
    54         send_catch_log(stats_domain_opened, domain=domain) 
     52    def open_spider(self, spider): 
     53        self._stats[spider] = {} 
     54        send_catch_log(stats_spider_opened, spider=spider) 
    5555 
    56     def close_domain(self, domain, reason): 
    57         send_catch_log(stats_domain_closing, domain=domain, reason=reason) 
    58         stats = self._stats.pop(domain) 
    59         send_catch_log(stats_domain_closed, domain=domain, reason=reason, \ 
    60             domain_stats=stats) 
     56    def close_spider(self, spider, reason): 
     57        send_catch_log(stats_spider_closing, spider=spider, reason=reason) 
     58        stats = self._stats.pop(spider) 
     59        send_catch_log(stats_spider_closed, spider=spider, reason=reason, \ 
     60            spider_stats=stats) 
    6161        if self._dump: 
    62             log.msg("Dumping domain stats:\n" + pprint.pformat(stats), \ 
    63                 domain=domain) 
    64         self._persist_stats(stats, domain) 
     62            log.msg("Dumping spider stats:\n" + pprint.pformat(stats), \ 
     63                spider=spider) 
     64        self._persist_stats(stats, spider) 
    6565 
    6666    def _engine_stopped(self): 
     
    6868        if self._dump: 
    6969            log.msg("Dumping global stats:\n" + pprint.pformat(stats)) 
    70         self._persist_stats(stats, domain=None) 
     70        self._persist_stats(stats, spider=None) 
    7171 
    72     def _persist_stats(self, stats, domain=None): 
     72    def _persist_stats(self, stats, spider=None): 
    7373        pass 
    7474 
     
    7878        super(MemoryStatsCollector, self).__init__() 
    7979        self.domain_stats = {} 
    80          
    81     def _persist_stats(self, stats, domain=None): 
    82         self.domain_stats[domain] = stats 
     80 
     81    def _persist_stats(self, stats, spider=None): 
     82        if spider is not None: 
     83            self.domain_stats[spider.domain_name] = stats 
    8384 
    8485 
    8586class DummyStatsCollector(StatsCollector): 
    8687 
    87     def get_value(self, key, default=None, domain=None): 
     88    def get_value(self, key, default=None, spider=None): 
    8889        return default 
    8990 
    90     def set_value(self, key, value, domain=None): 
     91    def set_value(self, key, value, spider=None): 
    9192        pass 
    9293 
    93     def set_stats(self, stats, domain=None): 
     94    def set_stats(self, stats, spider=None): 
    9495        pass 
    9596 
    96     def inc_value(self, key, count=1, start=0, domain=None): 
     97    def inc_value(self, key, count=1, start=0, spider=None): 
    9798        pass 
    9899 
    99     def max_value(self, key, value, domain=None): 
     100    def max_value(self, key, value, spider=None): 
    100101        pass 
    101102 
    102     def min_value(self, key, value, domain=None): 
     103    def min_value(self, key, value, spider=None): 
    103104        pass 
    104105 
  • scrapy/stats/collector/mysql.py

    r1680 r1849  
    1717        self._mysql_conn = mysql_connect(mysqluri, use_unicode=False) if mysqluri else None 
    1818         
    19     def _persist_stats(self, stats, domain=None): 
    20         if domain is None: # only store domain-specific stats 
     19    def _persist_stats(self, stats, spider=None): 
     20        if spider is None: # only store spider-specific stats 
    2121            return 
    2222        if self._mysql_conn is None: 
     
    2828        c = self._mysql_conn.cursor() 
    2929        c.execute("INSERT INTO %s (domain,stored,data) VALUES (%%s,%%s,%%s)" % table, \ 
    30             (domain, stored, datas)) 
     30            (spider.domain_name, stored, datas)) 
    3131        self._mysql_conn.commit() 
  • scrapy/stats/collector/simpledb.py

    r1624 r1849  
    2323        connect_sdb().create_domain(self._sdbdomain) 
    2424 
    25     def _persist_stats(self, stats, domain=None): 
    26         if domain is None: # only store domain-specific stats 
     25    def _persist_stats(self, stats, spider=None): 
     26        if spider is None: # only store spider-specific stats 
    2727            return 
    2828        if not self._sdbdomain: 
    2929            return 
    3030        if self._async: 
    31             dfd = threads.deferToThread(self._persist_to_sdb, domain, stats.copy()) 
     31            dfd = threads.deferToThread(self._persist_to_sdb, spider, stats.copy()) 
    3232            dfd.addErrback(log.err, 'Error uploading stats to SimpleDB', \ 
    33                 domain=domain) 
     33                spider=spider) 
    3434        else: 
    35             self._persist_to_sdb(domain, stats) 
     35            self._persist_to_sdb(spider, stats) 
    3636 
    37     def _persist_to_sdb(self, domain, stats): 
    38         ts = self._get_timestamp(domain).isoformat() 
    39         sdb_item_id = "%s_%s" % (domain, ts) 
     37    def _persist_to_sdb(self, spider, stats): 
     38        ts = self._get_timestamp(spider).isoformat() 
     39        sdb_item_id = "%s_%s" % (spider.domain_name, ts) 
    4040        sdb_item = dict((k, self._to_sdb_value(v, k)) for k, v in stats.iteritems()) 
    41         sdb_item['domain'] = domain 
     41        sdb_item['domain'] = spider.domain_name 
    4242        sdb_item['timestamp'] = self._to_sdb_value(ts) 
    4343        connect_sdb().put_attributes(self._sdbdomain, sdb_item_id, sdb_item) 
    4444 
    45     def _get_timestamp(self, domain): 
     45    def _get_timestamp(self, spider): 
    4646        return datetime.utcnow() 
    4747 
  • scrapy/stats/signals.py

    r1297 r1849  
    1 stats_domain_opened = object() 
    2 stats_domain_closing = object() 
    3 stats_domain_closed = object() 
     1stats_spider_opened = object() 
     2stats_spider_closing = object() 
     3stats_spider_closed = object() 
  • scrapy/tests/test_downloadermiddleware_stats.py

    r1684 r1849  
    11from unittest import TestCase 
    22 
    3 from scrapy.conf import settings 
    43from scrapy.contrib.downloadermiddleware.stats import DownloaderStats 
    54from scrapy.http import Request, Response 
     
    1110 
    1211    def setUp(self): 
    13         self.spider = BaseSpider() 
    14         self.spider.domain_name = 'scrapytest.org' 
     12        self.spider = BaseSpider('scrapytest.org') 
    1513        self.mw = DownloaderStats() 
    1614 
    17         stats.open_domain(self.spider.domain_name) 
     15        stats.open_spider(self.spider) 
    1816 
    1917        self.req = Request('scrapytest.org') 
     
    2321        self.mw.process_request(self.req, self.spider) 
    2422        self.assertEqual(stats.get_value('downloader/request_count', \ 
    25             domain=self.spider.domain_name), 1) 
     23            spider=self.spider), 1) 
    2624         
    2725    def test_process_response(self): 
    2826        self.mw.process_response(self.req, self.res, self.spider) 
    2927        self.assertEqual(stats.get_value('downloader/response_count', \ 
    30             domain=self.spider.domain_name), 1) 
     28            spider=self.spider), 1) 
    3129 
    3230    def test_process_exception(self): 
    3331        self.mw.process_exception(self.req, Exception(), self.spider) 
    3432        self.assertEqual(stats.get_value('downloader/exception_count', \ 
    35             domain=self.spider.domain_name), 1) 
     33            spider=self.spider), 1) 
    3634 
    3735    def tearDown(self): 
    38         stats.close_domain(self.spider.domain_name, '') 
     36        stats.close_spider(self.spider, '') 
    3937 
  • scrapy/tests/test_spidermiddleware_depth.py

    r1685 r1849  
    1515        settings.overrides['DEPTH_STATS'] = True 
    1616 
    17         self.spider = BaseSpider() 
    18         self.spider.domain_name = 'scrapytest.org' 
     17        self.spider = BaseSpider('scrapytest.org') 
    1918 
    20         stats.open_domain(self.spider.domain_name) 
     19        stats.open_spider(self.spider) 
    2120 
    2221        self.mw = DepthMiddleware() 
     
    3231        self.assertEquals(out, result) 
    3332 
    34         rdc = stats.get_value('request_depth_count/1', 
    35                               domain=self.spider.domain_name) 
     33        rdc = stats.get_value('request_depth_count/1', spider=self.spider) 
    3634        self.assertEquals(rdc, 1) 
    3735 
     
    4139        self.assertEquals(out2, []) 
    4240 
    43         rdm = stats.get_value('request_depth_max', 
    44                               domain=self.spider.domain_name) 
     41        rdm = stats.get_value('request_depth_max', spider=self.spider) 
    4542        self.assertEquals(rdm, 1) 
    4643  
     
    5047        settings.disabled = True 
    5148 
    52         stats.close_domain(self.spider.domain_name, '') 
     49        stats.close_spider(self.spider, '') 
    5350 
  • scrapy/tests/test_stats.py

    r1821 r1849  
    11import unittest  
    22 
     3from scrapy.spider import BaseSpider 
    34from scrapy.xlib.pydispatch import dispatcher 
    45from scrapy.stats.collector import StatsCollector, DummyStatsCollector 
    5 from scrapy.stats.signals import stats_domain_opened, stats_domain_closing, \ 
    6     stats_domain_closed 
     6from scrapy.stats.signals import stats_spider_opened, stats_spider_closing, \ 
     7    stats_spider_closed 
    78 
    89class StatsCollectorTest(unittest.TestCase): 
     10 
     11    def setUp(self): 
     12        self.spider = BaseSpider() 
    913 
    1014    def test_collector(self): 
     
    4448        stats.max_value('v2', 100) 
    4549        stats.min_value('v3', 100) 
    46         stats.open_domain('a') 
    47         stats.set_value('test', 'value', domain='a') 
     50        stats.open_spider('a') 
     51        stats.set_value('test', 'value', spider=self.spider) 
    4852        self.assertEqual(stats.get_stats(), {}) 
    4953        self.assertEqual(stats.get_stats('a'), {}) 
     
    5256        signals_catched = set() 
    5357 
    54         def domain_opened(domain): 
    55             assert domain == 'example.com' 
    56             signals_catched.add(stats_domain_opened) 
     58        def spider_opened(spider): 
     59            assert spider is self.spider 
     60            signals_catched.add(stats_spider_opened) 
    5761 
    58         def domain_closing(domain, reason): 
    59             assert domain == 'example.com' 
     62        def spider_closing(spider, reason): 
     63            assert spider is self.spider 
    6064            assert reason == 'testing' 
    61             signals_catched.add(stats_domain_closing) 
     65            signals_catched.add(stats_spider_closing) 
    6266 
    63         def domain_closed(domain, reason, domain_stats): 
    64             assert domain == 'example.com' 
     67        def spider_closed(spider, reason, spider_stats): 
     68            assert spider is self.spider 
    6569            assert reason == 'testing' 
    66             assert domain_stats == {'test': 1} 
    67             signals_catched.add(stats_domain_closed) 
     70            assert spider_stats == {'test': 1} 
     71            signals_catched.add(stats_spider_closed) 
    6872 
    69         dispatcher.connect(domain_opened, signal=stats_domain_opened) 
    70         dispatcher.connect(domain_closing, signal=stats_domain_closing) 
    71         dispatcher.connect(domain_closed, signal=stats_domain_closed) 
     73        dispatcher.connect(spider_opened, signal=stats_spider_opened) 
     74        dispatcher.connect(spider_closing, signal=stats_spider_closing) 
     75        dispatcher.connect(spider_closed, signal=stats_spider_closed) 
    7276 
    7377        stats = StatsCollector() 
    74         stats.open_domain('example.com') 
    75         stats.set_value('test', 1, domain='example.com') 
    76         stats.close_domain('example.com', 'testing') 
    77         assert stats_domain_opened in signals_catched 
    78         assert stats_domain_closing in signals_catched 
    79         assert stats_domain_closed in signals_catched 
     78        stats.open_spider(self.spider) 
     79        stats.set_value('test', 1, spider=self.spider) 
     80        self.assertEqual([(self.spider, {'test': 1})], list(stats.iter_spider_stats())) 
     81        stats.close_spider(self.spider, 'testing') 
     82        assert stats_spider_opened in signals_catched 
     83        assert stats_spider_closing in signals_catched 
     84        assert stats_spider_closed in signals_catched 
    8085 
    81         dispatcher.disconnect(domain_opened, signal=stats_domain_opened) 
    82         dispatcher.disconnect(domain_closing, signal=stats_domain_closing) 
    83         dispatcher.disconnect(domain_closed, signal=stats_domain_closed) 
     86        dispatcher.disconnect(spider_opened, signal=stats_spider_opened) 
     87        dispatcher.disconnect(spider_closing, signal=stats_spider_closing) 
     88        dispatcher.disconnect(spider_closed, signal=stats_spider_closed) 
    8489 
    8590if __name__ == "__main__":