python - yield multiple request to a same parse function. what is the order of function running? Scrapy -


basically program has following structure.

def first_parse:   link in links:      yield request(url = link, callback = second_parse)  def second_parse:    # webdriver.get(url) , crawl data. 

i using selenium webdriver load url first_parse. each page, program need time load page , work. reasons, miss of link links.
so suspecting when yield operation send request second_parse, second_parse still working on previous request. request missed. correct?
if not, happened when yield sending request?
for example, if there 20 links, first_parse send 20 requests second_parse function. second_parse take 10 seconds each request, when first request running on second_parse, others waiting on queue? or gone?

you reusing same webdriver instance in second_parse() method. suspect causing problems since instantiated webdriver navigates different page when not done current. should instantiate , close webdriver in second_parse() method:

def second_parse(self, response):    webdriver = webdriver.firefox()    webdriver.get(url)     # scrape     webdriver.close() 

this though may lead 20 browsers active @ same time.


Comments