pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Setting up bulkbuild
* On 2023-09-03 at 21:00 BST, Jan-Benedict Glaw wrote:
It's now in the "Scanning..." phase and manages to "scan" about 250
packages per day, with a total of nearly 20k. So it'll be scanning for
about three months, give or take. Are there ways to speed this up?
The general approaches to speeding up the scan phase, regardless of
operating system, are:
1. Run multiple pbulk-scan processes:
There is no native support for this as none of my proposed patches
have been accepted, but as a quick hack you can literally just run
more copies of the "pbulk-scan -c ..." process, ideally up to as many
CPUs as you have.
If you only have a single CPU then you may still find running 2
processes makes things faster, otherwise given you are running virtual
machines, simply spin up more VMs.
If you want to go the whole way and support chrooted scans and builds
then my full patchset is available here:
https://github.com/NetBSD/pkgsrc/compare/trunk...TritonDataCenter:pkgsrc:feature/pbulk/trunk
2. Enable options cache:
This one is very straight-forward, set PBULK_CACHE_DIRECTORY=/var/tmp
or similar in your mk.conf and pbulk will re-use any options that have
already been calculated, as these are quite expensive.
3. Enable reuse_scan_results:
If you are never going to change any configuration for your builds
then it may be safe to set reuse_scan_results=yes in pbulk.conf and
any subsequent scans will be faster. However this is not suitable if
you make any changes, and so I generally have this turned off.
4. Reduce forked commands:
The biggest impact to scan speed is the fact that running 'bmake
pbulk-index' in every single pkgsrc directly recomputes a whole bunch
of variables that require running external commands. This is seen
most clearly by running e.g. dtrace when a pbulk-scan is happening and
all you see is the same commands being executed over and over and over
again.
For the latter I have a few approaches to improving it.
4a. Pre-compute builtin variables:
If you are using the same OS for each build, then you can pre-compute
the results of builtin variables to avoid having to recalculate them
each time. For example, for my SmartOS builds I have the following
file included into my mk.conf:
https://github.com/TritonDataCenter/pkgbuild/blob/master/include/varcache/20210826.mk
Note that this is somewhat similar to the existing bsd.makevars.mk,
however that is not used for scans as there is no work area, and an
interesting fact is that bsd.makevars.mk actually makes things slower
due to its use of shell! A stark warning to avoid forks if ever there
was one.
4b. Hardcode system variables:
This one requires modifying mk/bsd.prefs.mk as the variables must be
set early, but for example I use this on my macOS builds to avoid
things like uname being called for every single 'make' invocation:
https://github.com/TritonDataCenter/pkgsrc/commit/0084220e1b283401093db7efc4cdab08453dfa47
Obviously things like this need careful updating every time you change
anything on the host OS.
4c. Avoid expensive GCC calculations:
One of my major objections to the whole GCC_REQD thing is that it
slows everything down, due to having to run 'pkg_admin pmatch' for
every single version listed in GCC_REQD for every package, along with
pointless `gcc -dumpversion` commands to get version strings we don't
use.
I avoid some of this by hardcoding _GCC_REQD in my SmartOS version of
the bsd.prefs.mk patch:
https://github.com/TritonDataCenter/pkgsrc/commit/702ba199aadc56f5f58b7b3cb6880e430dd59266
but there's definitely more to do here. For example we don't even
bother using MAKEFLAGS for some of the computed variables in gcc.mk!
As a guide, with these approaches in place, most of my scans take around
10 minutes, though of course that's on modern hardware not VAX.
--
Jonathan Perkin - mnx.io - pkgsrc.smartos.org
Open Source Complete Cloud www.tritondatacenter.com
Home |
Main Index |
Thread Index |
Old Index