multithreading - C# .NET webbrowser control in separate thread raising event on main thread -


i need scrape lot of pages in paralel, while ui thread must not blocked. creating thread each page (url) , instantiateing webbrowser control in thread execute javascript , html after that. when webbrowser gets html m raising event on ui thread register browser has done job, beacuse want know when browsers have fetched html can merge data , display it.

1.)the first probem is, threads never raise event i'm stuck waiting.

2.)the second problem cant dispose browser without causing external browser fire, allways pulling rug beneath browser decides continue opening page in users default browser guess. if not disposing @ all, m running out of ram.

i v been searching around, found lot of related stuff fail implement use case. here s code:

 [system.runtime.interopservices.comvisibleattribute(true)]     public partial class form1 : form     {         public delegate void threadfinishedeventhandler(object source, eventargs e);         public event threadfinishedeventhandler threadfinishedevent;         int threadcount = 0;         int threadreturnedcount = 0;         list<string> linksglobal;          public form1()         {             initializecomponent();             threadfinishedevent += new threadfinishedeventhandler(onthreadfinished);          }         private void form1_load(object sender, eventargs e)         {          }         private void btngo_click(object sender, eventargs e)         {             scrapelinkswithbrowsersinseparatethreads();         }           private void scrapelinkswithbrowsersinseparatethreads()         {             linksglobal = getlinks(); //10 urls same -> https://sports.betway.com             threadcount = linksglobal.count;              random rand = new random(123);             int waittime = 0;//trying not registered dos attack or smth             foreach (string url in linksglobal)             {                 runbrowserthread(url, waittime);                 waittime += rand.next(500, 3000) + 500;//each browser start navigating withing 1 - 4 seconds interval each other             }         }            public void runbrowserthread(string url, int waittime)         {             var th = new thread(() =>             {                 try                 {                     webbrowserdocumentcompletedeventhandler completed = null;                     webbrowser wb = new webbrowser();                      completed = (sndr, e) =>                     {                         if (e.url.absolutepath != (sndr webbrowser).url.absolutepath)                         {                             wb.documentcompleted -= completed;                             string html = (sndr webbrowser).document.body.innerhtml;                              threadfinishedevent.raise(this, eventargs.empty); // have eventextension allowing me                              //wb.dispose(); //whenever , wherever put causes external browser fire                             // application.exitthread();  //this seems cause event never firing, not shure                         }                      };                      wb.documentcompleted += completed;                     wb.scripterrorssuppressed = true;                      thread.sleep(waittime); //tryin not registerd dos attck or smth, each browser start navigating withing 1 - 4 seconds interval each other                     wb.navigate(url);                     application.run();                 }                 catch (exception ex)                 {                     throw ex;                 }              });             th.setapartmentstate(apartmentstate.sta);             th.start();         }           private void onthreadfinished(object source, eventargs e)         {             threadreturnedcount++; // smth 3 - 5 out od 11 threads, event stops being raised, dunno why             if (threadreturnedcount == threadcount)             {                 // work                 //this never happens cos lot of threads never raise event,             }          }            private list<string> getlinks()         {             list<string> links = new list<string>();              links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");             links.add("https://sports.betway.com");              return links;         }       } 

p.s. returnign data threads separate problem, did not implement yet first want solve this. use objectfactory called each thread factory.createobject(html), have use kind of locking on factory since located on main thread.

i did not manage find clean solutions problems presented in question. did try things , did results, not enough. review question , explain did in end solve problem.

1.)the first probem is, threads never raise event i'm stuck waiting.

answer 1: still not shure happening here, got alot better after (kinda) solved second problem

2.)the second problem cant dispose browser without causing external browser fire

answer 2: can done using activexinstance of web browser control, ll need include shdocvw dll project. check frank_fc's answer here detect webbrowser complete page loading

there memory leak problems webbrowser control. using google found out how can reduce problems (there lot of info on that).

in end, whole thing not stable, memory leaks stil happened, out of memory exceptions, unpredictable behaviour, bad performance (slow page loadings) etc. etc. , code ugly , seemed just... not right way things. do not use webbrowser control if want scrape lot of pages in short time. not instantiate dozens of invisible webbrowser controls each in own thead , expect handle events efficiently.

what did in end ? had beer friend showed me program made college task. java program developed in eclipse using jsoup package scraping web. 2 functions in java, each function 10 - 20 lines of code , got 100 times faster, simpler, better solution mine. gethtml(url) , jsoup gets you, not matter if page runs javascript or anything, crazy.

so .net app firing java app writes html in text files on disk , when it's finished .net app collects data, cycling on , on again.

spent 100+ hours fiddling webbrowser control , in 2 hours time made immeasurably better solution. choose tools wisely ! java + eclipse + jsoup seems better way go scraping/crawling .net


Comments