tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: What if the console device is only accessible from one CPU in a multiprocessor system?



On Sat, Mar 08, 2025 at 11:18:07AM -0800, Jason Thorpe wrote:
> > On Mar 6, 2025, at 1:57 AM, Christoph Badura <bad%bsd.de@localhost> wrote:
> > My first thought over morning coffe was to add a thin layer on the
> > cnputc/cngetc/cnpollc interface level.  Basically have a stack of
> > cn_tabs (only the necessary parts perhaps).  On the Laser/TurboLaser
> > machines have "virtual" cnputc/cngetc/cnpollc that, on the boot CPU,
> > calls the "real" cnputc/cngetc that frobs the hardware and, on the other
> > CPUs, xcalls to the boot CPU.  I presume you can easily identify the
> > boot CPU.  I don't see a way around the xcall.
> 
> Yes, it is trivial to identify the boot CPU.
> 
> That’s more or less what I did in the “mcclock” driver.  In this
> particular case, I wasn’t thinking of doing it at the cn*c layer,
> but more in the driver itself; even though it’s a Z8530 and could
> use the zsc / zstty code in theory, in practice I don’t think it’s
> worth it and would probably just do a stripped down “gbuscn”
> driver (DEC calls this CPU module local bus the “GBus”) that does
> the bare minimum.  That driver would have all of the necessary xcall
> goop.

What I had in mind was in rough pseudocode:

void lazercnputc(dev, c) {
	if (!primary_cpu_p(curcpu())) {
		zndev = /* magic happens here */;
		zscnputc(zndev, c);
	}
	else
		primcpucnputc(c);
}

Of course, I'm leaving out quite a bit of setup.  And I'm not sure how to
fit that into device discovery.  Maybe some alpha/tc specific code in zs
could insert the indirecting function if it thinks it should the console
device?

> What I really don’t like, though, is e.g. a kernel printf that
> happens to run on a non-primary CPU could incur possibly a LOT of
> xcalls.  Maybe I shouldn’t really care too much because how often do
> I really expect that to happen?  OTOH, it’s also the case that just
> regular user logging in on the console could easily trigger this as
> well.

Yeah.  I had the same thoughts.  Then I looked at kprintf.  It putchar()s
each character as it produces them.  We would have to introduce some code
that appends the string to a buffer and flushes that when it becomes full.
I don't want to think about what size the buffer would have to be.

> > The "thin layer" may come in handy when we want to deal with multiple
> > devices acting as console like e.g. Linux has it.  There you can have
> > console on e.g. frame buffer and serial simultaneously.
> 
> I think wscons already has a way to do this, actually?

My understanding is that wscons deals only with framebuffer consoles and
with serial consoles not at all.  You can attach multiple keyboards and
mice to it as input devices.  But I don't think it handles output to
multipe devices.

I think Jared worked a bit on having console on framebuffer and serial for
some of his ARM boards.  And that might work for output put not for input
from both devices.  I think the main issue was that cnpollc can't poll
multiple devices in parallel because it waits for a character to be
"received".  As I remember he wasn't totally satisfied with the outcome.
But I don't remember more details.

Come to think of it, maybe it is possible to internally mark one of
console devices as "input device" or "debugger device" and only poll that.
That way you could get kernel console output to both framebuffer and
serial.  But you would have to designate one the devices as the one to be
used for cngetc/cnpollc.

> > I'm surprised you expect weirdness with DDB.  I thought DDB runs on the
> > boot CPU and pauses all other CPUs.  Doing xcalls to the other CPUs for
> > stack traces etc.  I.e. it's a non-issue for DDB.
> 
> Well, on Alpha, currently, the way it works is the BUGCHK handler
> (which is invoked on whatever CPU calls Debugger()) calls
> alpha_debug(), which sends an IPI to all other CPUs telling them to
> pause (this is a tight loop in the IPI handler until that CPU sees its
> bit in the “paused CPUs” bitmask get cleared), switches to the
> debug stack, enters DDB, switches back to the previous stack, and then
> un-pauses all other CPUs.  There is not currently anything in DDB that
> enforces “must be running on the primary CPU” to my knowledge, and
> it’s definitely not implemented that way on Alpha, currently.

Interesting.  But it does pause all the other CPUs.  And, IIRC you said on
all the non Laser/Turbolaser machines you can access there serial console
chip from all CPUs.

I didn't think of entering DDB via Debugger() when I read your previous
mail.  But it seems to me it doesn't make much of a difference.

So there are two scenarios:

- entering DDB from the console when hw.cnmagic is detected.
- entering DDB by calling Debugger() on a "random" CPU.

> It is perhaps worth changing this, but it could make entering DDB
> problematic if the primary CPU was stuck at a sufficiently high spl
> to block IPIs.  Of course, that wouldn’t make much difference on a
> Laser/TurboLaser, but it would unnecessarily cripple other systems,
> potentially.

So on these machines.

Lets call the CPU that has access to the actual console serial chip the
"console CPU."

if DDB is entered by detecting hw.cnmagic it is running on the CPU that
has access to the actual serial chip and no xcalls required.

if DDB is entered from a "random" CPU via Debugger(), it "pauses" all
the other CPUs (including the "console CPU") and has them spinning on
their bit in the "paused CPUs" bitmask in the IPI handler.  I guess you
can't do another xcall in this situation.

But it could set another bit in some bitmask that tells the spinning
"console CPU" that a some console interaction (input/output/poll) is
necessary.  Such a scheme seems workable to me.

Of course, if the "console CPU" was stuck on a sufficiently high spl to
block IPS you'd be hosed in this scheme for the case that some other CPU
calls Debugger().  Unless you can think of some other scheme to signal
the "console CPUT".  Like causing an NMI on it or something.

Sorry if this is all a bit high-level and ignoring a lot of details.
I can stop throwing ideas up into the air if you want me to.
It was fun to think about this, though.
--chris


Home | Main Index | Thread Index | Old Index