pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: BLAS: Does cblas interface work with BLAS types other than netlib
On Mon, Sep 26, 2022 at 12:29:52PM +0530, Mayuresh wrote:
> Finally, I find that the choice at the time of compiling dlib-cpp doesn't
> matter, as long as the build process knows CBLAS exists. The one at the
> time of compiling the final executable gives me MT and the speed up
> required.
For those interested in BLAS or DLIB or just MT performance, some
observations:
On NetBSD 9.3 amd64 VPS with 4 cores, for a certain deep learning job here
are some readings.
With openblas_pthread
NUM_THREADS=1
14666.77 real 14661.62 user 2.86 sys
NUM_THREADS=2
13104.87 real 14918.68 user 8052.21 sys
NUM_THREADS=4
12417.81 real 15674.54 user 24096.69 sys
With openblas_openmp
OMP_NUM_THREADS=1
14548.69 real 14539.60 user 3.65 sys
OMP_NUM_THREADS=2
13785.67 real 15064.01 user 302.84 sys
OMP_NUM_THREADS=4
13621.12 real 15465.07 user 1320.44 sys
With stock blas in dlib (i.e. without openblas), which is single threaded
12261 real
So
- pthread openblas seems to be performing better than openmp for MT
but spending much higher sys time
- single threaded stock blas of dlib seems to be performing better
than either single or multi-threaded openblas in either mode
Workload details:
- A convolutional neural network with window size 5x5, 1.5 million
samples, 6 dense layers, 10000 iterations
--
Mayuresh
Home |
Main Index |
Thread Index |
Old Index