On Wed, Feb 17, 2016 at 9:12 PM, Roy Marples <roy%marples.name@localhost> wrote:
On 17/02/2016 09:02, Ryota Ozaki wrote:
So what events would you choose to skip, if not the scheme that Roy
described?
(I think I confused you, sorry...)
I rather want to not skip anything as much as possible
(except for repeating same events (e.g., up/up/up) because
keeping them all changes the original behavior).
I intend to skip/eliminate events only if there are too many
events happen in a short period (i.e., need queuing) to protect
the system from overloading. In that case (it's a very rare case
I think), we just drop an earliest event first.
How much is too many and what is a short period?
We can choose a number that applications unlikely to handle the events
(10 or so). A short period means a period between the first interrupt
for a link state event happens and a softint for link state changes
starts running.
Once you start skipping/eliminating events, how is your solution any
better? How do you measure some lossage vs some lossage?
Mine doesn't drop events if there are only a few events while yours
drops one event even if there are just two events.
I suppose that a few or several events can happen in "a short period"
easier than a dozen of events (or more) and the latter implies
some hardware troubles (or VMM defects?) and needs a special care
to protect the system, for example we give up delivering all
events. For the former, we shouldn't skip/eliminate events.
Also, we can't just drop the earliest event first - we have to ensure
that each state is left in the queue.
Consider starting in UP:
DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP
We cannot just discard the fact it went down because important events
attached to DOWN won't trigger.
We can preserve DOWN specially if we need.
Lastly, have we considered the system could be overloaded due to so many
link state change events? A longer queue or more complicated would only
make this worse.
Of course, I care and so accept dropping events, but do you really think
just two events cause overload?
From an earlier post of yours:
Even if a UP state is transient, it's an event that may provide us a
hint of network conditions for diagnostic. We may be able to get it
from the console output, but it's not so convenient; we need to
track events via two different facilities.
If you're skipping/eliminating events as well then you would also need a
second facility to record this. Other than scribbling on the console,
what did you have in mind? Could this be used elsewhere in the system
where equvialent network assertations are recorded?
I don't plan to provide another facility to notify events (even if we
provide something, nobody wants to use it, I think). Yes, it's a
limitation that we cannot always provide full events, no objection on that.
But we can still tell that something bad is happening by sending a bunch
of events at once.
ozaki-r