python - yield multiple request to a same parse function. what is the order of function running? Scrapy -
basically program has following structure.
def first_parse: link in links: yield request(url = link, callback = second_parse) def second_parse: # webdriver.get(url) , crawl data.
i using selenium webdriver load url first_parse. each page, program need time load page , work. reasons, miss of link links.
so suspecting when yield operation send request second_parse, second_parse still working on previous request. request missed. correct?
if not, happened when yield sending request?
for example, if there 20 links, first_parse send 20 requests second_parse function. second_parse take 10 seconds each request, when first request running on second_parse, others waiting on queue? or gone?
you reusing same webdriver
instance in second_parse()
method. suspect causing problems since instantiated webdriver navigates different page when not done current. should instantiate , close webdriver
in second_parse()
method:
def second_parse(self, response): webdriver = webdriver.firefox() webdriver.get(url) # scrape webdriver.close()
this though may lead 20 browsers active @ same time.
Comments
Post a Comment