Subject: parallel computing, SMP, and threading
To: Tim & Alethea Larson <thelarsons3@cox.net>
From: Erik E. Fair <fair@netbsd.org>
List: port-mac68k
Date: 04/30/2004 13:00:07
At 12:43 -0500 4/30/04, Tim & Alethea Larson wrote:
Yes, I thought good threading was something of a prerequisite
to SMP. What does threading do for us on a non-MP system? Can the
kernel scheduler get things done more efficiently with threaded apps?
-----
To be perfectly clear, you don't need any kind of thread support for
an MP or SMP system to be useful. The utility is in having more than
one processor to pick processes off the run queue to run in parallel.
Since UNIX loves to spawn processes, this wins for throughput right
away even if any particular application doesn't run any faster than
it did on a uniprocessor system with the same speed CPU. Imagine what
an SMP does for E-mail processing on an SMTP server when the MTA
spawns a new process for each SMTP client that has contacted it. Each
SMTP connection is independent, and can be run in parallel. Add
processors, speed things up (until you run into some other limit,
like disk or RAM bandwidth).
Thread support isn't even required to speed up your application, if
your application can spawn additional processes to divide the work.
Take compiling a program with "make -j N" for N CPUs, for example.
Make knows which parts of the building process can be done in
parallel (i.e. that do not depend on each other), and which parts
must be serialized (do one before starting the other, e.g. running
lex(1) or yacc(1) to generate a ".c" file before running cc(1) to
compile). Make essentially performs data flow analysis on program
compilation:
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=data+flow+analysis
However, you'll note that make doesn't require shared memory for its
work - merely a shared filesystem. So, if you tell make(1) how many
processors you have, it will spawn as many parallel compiles, etc.,
as it can within the min() of the number of CPUs (as specified by
"-j") or possible parallel compiles (as specified by the structure of
the Makefile).
If your application has a lot of shared data that needs to be
accessed quickly (i.e. faster than disk access), then threading makes
sense - one process, many "threads" running in that process with a
shared address space. Just be careful to watch out for data integrity
by using semaphores to lock shared data structures before modifying
them. Also, depending on the application, you may find that the cache
coherency and semaphore overhead eats away at some of the potential
performance gain, if your application shares memory "too much".
Many of these issues are discussed in detail in the book, "In Search
of Clusters" (2nd Ed.) by Gregory F. Pfister. The NetBSD Project has
a mailing list for discussing clustering for NetBSD systems:
tech-cluster@netbsd.org
It's also important to remember Amdahl's Law:
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=Amdahl%27s+Law
I hope this clarifies things somewhat.
Erik <fair@netbsd.org>