Subject: Re: devfs, was Re: ptyfs fully working now...
To: Bill Studenmund <wrstuden@netbsd.org>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 11/13/2004 03:30:43
On Nov 13, 12:10am, wrstuden@netbsd.org (Bill Studenmund) wrote:
-- Subject: Re: devfs, was Re: ptyfs fully working now...
| On Fri, Nov 12, 2004 at 09:08:24PM -0500, Christos Zoulas wrote:
| > On Nov 12, 4:13pm, wrstuden@netbsd.org (Bill Studenmund) wrote:
| > -- Subject: devfs, was Re: ptyfs fully working now...
| >
| > Here's what I've been thinking. At boot time, you pass the mount struct
| > of devfs a filename which contains a list of commands to be applied to
| > it before it gets mounted. These have plain unix syntax and they can be
| >
| > chown id:id name # change ownership to a configured node
| > chmod mode name # change permissions to a configured node
| > rm name # whiteout a configured node
| > ln -s name # make a symlink to an configured node if it exists
| > mkdir name # create a directory
| > mknod name [b|c M M] [p]# create a node, for sockets we just create them.
|
| I like the idea of a file that contains info about modes and owners, and I
| hadn't thought about whiteouts - good idea. However I think a better way
| to do this is a binary database. I think the keys should be locators;
| where the device is in the config hierarchy. For each entry, we keep most
| of the info you have below - name, uid, gid, mode (or ACL), mtime, atime,
| ctime (I don't think birthtime matters as it won't show up in stat), and
| type.
Fine, I agree that the device mapping should be using locators. I think
that birthtime should go in; it would have been nice for stat to be able
to access it, but that is not the case yet...
| I am not sure about the idea of making directories nor symlinks. It may be
|
| a good one.. Same with pipes in /dev.
|
| The thing I don't like is that you're using dev_t in what seems like a
| canonical manner. My understanding of the whole idea of devfs, though, is
I just wanted to stash dev_t somewhere to returning it to userland for
stat. It is not meant to be used for anything else.
| that dev_t really is just a number that gets thrown around; the kernel
| returns it in stat, and userland can use it for comparison. While major
| and minor numbers still make sense, the whole thing I wanted was for them
|
| to not at all matter from boot to boot.
|
| What I was thinking was that as we boot, devices register their nodes
| during configuration. Drivers add default info (like owner, mode, and most
| importantly default name & locator) while registering. Then, like you
| said, we read a file on boot. However my thought is that we merge the two
| databases, based on locator. That way devices that are here now and were
| here before have the exact settings as last time. Nodes that were here
| last boot but aren't now show up with a NULL device pointer. Nodes that
| are new show up with default settings.
It does not have to be at boot, but at mount time. Unless we want to mount
devfs after kernel autoconfiguration which I think is a bit radical. I prefer
to have it mounted by userland. Then people who don't like devfs don't need
to use it, and regular devices can still be used in the transition period.
| > This file gets loaded at mount time by the kernel into an internal hash t
| able
| > that contains:
| > LIST_ENTRY(devfsnode) hash; /* hash chain */
| > struct vnode *vnode; /* vnode associated with this entry
| struct device *device; /* our device, NULL if not
| * configured */
| > devfstype type; /* type of devfs node */
| > u_long ptyfs_fileno; /* unique file id */
| > char name[16];
| > uid_t uid;
| > gid_t gid;
| > mode_t mode;
| > int flags; /* immutable etc */
| > dev_t dev; /* if device, device info */
| > /* the timestamps for the node */
| > struct timespec mtime;
| > struct timespec ctime;
| > struct timespec atime;
| > struct timespec birthtime;
| >
| > int flag; /* below */
| > #define DEVFS_OVERRIDE_MODE 0x01
| > #define DEVFS_OVERRIDE_UID 0x02
| > #define DEVFS_OVERRIDE_GID 0x04
| > #define DEVFS_OVERRIDE_FLAGS 0x08
| > #define DEVFS_WHITEOUT 0x10
| > #define DEVFS_MKDIR 0x20
| > #define DEVFS_SYMLINK 0x40 /* target to be looked up in a different
| > * table */
| > #define DEVFS_MKNOD 0x80
| >
| > #define DEVFS_ACCESSED 0x1000 /* Node was accessed */
| > #define DEVFS_MODIFIED 0x2000 /* Node was written */
| > #define DEVFS_CHANGED 0x4000 /* Node was changed perm/ownership */
| > #define DEVFS_DIRTY 0x8000 /* Changes not reflected to the file */
| >
| > This is the same struct used internally for book-keeping. When an mkdir,
| > chmod, chown, rm, ln -s operation is done on devfs, the change is reflect
| ed
| > on the internal memory table, and the DIRTY flag is set. Occasionally [on
| ce
| > a minute if flag is DIRTY, the file we loaded get written with the updated
| > permissions. Or if it is DIRTY it is written on unmount. The file can live
| > under the mount if we don't want it accessible. We also provide a simple
| > character device that when we cat it, it provides a textual description of
| > the current set of commands.
|
| I do like your ideas about db updating; chown, chmod, mv, and rm should
| update the db. And a tool to turn the db file into a text representation
| may be good. But as before, the whole idea is to make device probe order
| not matter; partition "HR files" always has the same permissions
| regardless of if it's sd0 or sd19. If we use dev_t the way I think you
| described, we're still sensitive to probe order.
No, I meant the hashtable to be keyed by name... But I guess locators is
more stable.
| One issue that at least my thought of how devfs would have is that
| locators are really important, and may need maintaining. Like we may want
|
| to make device node locators be tied to device ID, like a SCSI disk's WWN.
|
| So the partition "HR files" on the disk with WWN FOO could be
| distinguished from a partition "HR files" on a zip drive someone hooked up
|
| to the computer. My ideas here are still rough, and would need work with
| how we handle wedges. But the main thought is to make it so that somehow
| hooking up a disk with a partition with a duplicate name of another
| partition won't cause the permissions of one to slip over to the other (I
|
| understand that Jason's thoughts on wedges would permit only one of the
| two identically-named partitions to be accessible at the same time; this
| idea is to make sure we can keep track of both of their permissions and
| permit only the right one to be active at once).
|
| Also, we would probably want a way to change the bind point for locators.
|
| For instance, when someone first updates to a devfs system, all their
| locators will be config-based. Like "sd0a" or "cd1d"; i.e. the devfs node
|
| really is tied to whatever shows up in that probe position. We will want a
|
| way to say tie a SCSI disk to a WWN. I'm sure there are other bindings
| that make sense, and we will want them where appropriate.
|
| The one issue I haven't thought through fully is what happens when you
| have device nodes with the same name that refer to distinctly different
| devices. Like you had a wedge "sd0a" and bound it to a given WWN. Now the
| disk with that WWN has attached as sd3, and a different disk is at sd0.
| I'm not sure how to handle the confusion in that case; maybe the thing to
| do is have the current "sd0" get turned into "sdX" and "sd3" get turned
| into "sd0". I'm not sure.
|
| While I talk a fair bit about wedges above, these thoughts apply to all
| device nodes. It's just that wedges and disks are the things that move
| around a lot yet we really realy want permissions to not change. Things
| like serial ports don't move around much.
I agree that wedges and disks need special consideration. I just have not
sat down and analyzed the requirements partition binding to device nodes yet.
christos