NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: firefox resource hog
On Sun, Jan 08, 2023 at 05:32:52PM +0100, tlaronde%polynum.com@localhost wrote:
> Le Sun, Jan 08, 2023 at 09:53:32PM +0530, Mayuresh a écrit :
> > On Sun, Jan 08, 2023 at 04:56:54PM +0100, tlaronde%polynum.com@localhost wrote:
> > > For this, I would use curl(1) (I do use it to automate downloading of
> > > pages when there are no capchas).
> >
> > How I do this is:
> >
> > 1. For some of the most simple scenarios, cookies ok but no js - curl / wget
> >
> > 2. A little more complex, where for some reason wget/curl doesn't work,
> > but still not requiring js - python mechanize
> >
> > 3. Requiring js - firefox, marionette
> >
> > Within 3:
> >
> > 3a. Headless if fully automatable use case, including some captchas which
> > I extract in headless mode and render on a terminal and get interactively
> > from keyboard.
> >
> > 3b. Non-headless when e.g. you want to automate only logging in to a
> > portal and do further things manually
> >
> > While I use all of them, 3b is which I require the most. For 3a and 3b
> > firefox works the best for me.
>
> I inspect the traffic first for example with the developer tools under
> Firefox, when js is only used to verify arguments and put them in
> canonical form before sending them, calling a page with HTTP or HTTPS,
> with GET or POST.
>
> Then, knowing what is the "API", I script under curl...
Sometimes pleasantly I'm surprised how much webscraping one can get
done with just shell & curl ;-)
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
Home |
Main Index |
Thread Index |
Old Index