Subject: Re: nore on disk stats
To: Charles Hannum <Charles-Hannum@deshaw.com>
From: Dennis Ferguson <dennis@Ipsilon.COM>
List: tech-kern
Date: 11/14/1995 17:45:17
> Change sysctl(2) to take (in essence) SNMP-like queries. It's pretty
> similar to this already, but needs some tweaking. Change things like
> the network statistics to work according to SNMP conventions. Write a
> trivial snmpd that just checks communities and uses said system call
> to do the brunt of the work. Modify programs like netstat(1) to use
> SNMP (or just grab the freely available tools and use them, where
> possible).
[...]
> One objection I've heard to this is that dynamic information isn't
> handled well by SNMP. I disagree with this, citing the network
I sympathize with the objective, but cringe a bit at the thought of
the implementation. Some of my biases come from trying to build routers
around this code, so it isn't trying to be SNMP-centric which bothers
me. But SNMP does have its limitations, which aren't important when
you are using it to turn a box on an NMS red when something breaks but
which seem to me to be poison when used as a kernel interface, for
example:
(1) the lack of atomicity when reading a chunk of data which is larger
than the amount a single SNMP query can accomodate,
(2) the total lack of atomicity when reading a table,
(3) the need to support getnext operations for all tables (sometimes this
is easy, but other times it is unnecessarily hard),
(4) the fact that you are going to have difficulty doing things which the
corresponding SNMP MIB didn't consider you might want to do.
I can think of some examples of all of these. For netstat(1), or some
other interested piece of software, to read the kernel routing table
now requires about 3 system calls: a sysctl(2) to find out the size of
the thing, a call to sbrk()/mmap() to acquire the (possibly very large
chunk of) memory, and another call to sysctl(2) to fetch an atomic snapshot
of the table. It's 3 system calls even if you've got 100,000 routes in
the table. With SNMP-style getnext's, on the other hand, it's going to
take a 100,000 system calls to read a 100,000 route table, and it is going
to be very difficult to keep track of what changed while you were in the
process of doing this, should this be important. And the same applies
to the interface list, in particular because a good routing protocol
implementation is going to have to reread this periodically (I know
you can get incrementals from the routing socket, but since the routing
socket is unreliable this doesn't replace the need to poll to avoid
mistakes). Periodically doing 300 system calls to fetch the state of
300 interfaces sort of sucks, a system call to fetch the whole table
at once is far better if this is a frequent operation.
And, on the topic of the kernel routing table, I've been working on a
replacement of the data structure which, among other things, allows you
to keep more than one route to the same destination in the tree. This
has a number of useful functional benefits (for example, it allows you
to keep interface addresses and routes reliably in the table by allowing
users to override the interface routes if they want different forwarding,
instead of replacing or removing them. Keeping direct routes reliably
in the forwarding table means you can replace most of the linear scans
of the interface list, in if.c and in the IP forwarding path, with
a lookup in the routing table, which means you can also support really
large numbers of interfaces without taking the severe performance hit),
but last I looked the SNMP forwarding table MIB had an in-built assumption
that there would only ever be one route to any destination in the forwarding
table. This means I have a kernel routing table implementation which SNMP
can't fully deal with. I don't mind much that my SNMP query tools are
constrained by this, but it is unacceptable that the kernel interface be
similarly constrained.
And I would question whether SNMP-style processing actually reduces the
amount of inappropriate code in the kernel. You do need to support
getnext operations to read tables. While this is trivial to do on
some tables, doing this efficiently in the routing table (sorry to keep
harping on this one, but I've been working on this and am familiar with
it) is going to require both data structure and a rather complex piece
of code to support the operation, both of which are unnecessary to the
kernel otherwise. In fact I think this is an example of where a wrapper
is exactly the right thing to do. In the situation where the box is a
host, and the forwarding table has been populated by hand or by redirects,
it's bound to be small enough that an occasional read of the full table
is not a hardship. In the case where the table has been assembled by
routing protocols and may be large, however, the right place to fetch
forwarding table information is not the kernel at all, but rather the
routing protocol implementation since the latter both knows more about the
routes in the kernel than the kernel does (in fact its the only place
which knows enough to respond to the full SNMP forwarding table MIB), and
will always need the data structure and code to support efficient getnext's
in any case. Sometimes wrappers make sense.
In any case, while making operations where it makes sense more SNMP-like
would be fine, particularly where SNMP is a frequent consumer, I think
there are a lot of cases where SNMP just gets in the way. I'd rather
have the flexibility to do what is right, given a knowledge of how the
most frequent non-SNMP consumers of the data use it if they are important,
rather than to be limited to SNMP's one-size-fits-all constraints, standard
or not.
Dennis Ferguson