Subject: Re: Questions about bufq_readprio.c,
To: Sumantra Kundu <sumantra@gmail.com>
From: Darrin B.Jewell <dbj@netbsd.org>
List: tech-kern
Date: 10/01/2007 20:39:11
"Sumantra Kundu" <sumantra@gmail.com> writes:
> Hi All,
> As I was working on the code of bufq_readprio.c, I
> encountered some questions/doubts:
>
> Any help/suggestions would be greatly appreciated!
>
> Thanks,
> --Sumantra
So bufq_readprio.c provides an instance of the bufq(9) kernel
interface, which can be used by any module that needs to maintain a
queue of disk buffer io requests. Most often it is used in the
implementation of the block/character device or pseudo-device drivers,
rather than filesystems or pagers, which are other parts of the kernel
that share the bufq structure.
There are conventions about the meaning and ownership of various
fields in struct buf, but bufq_readprio needs to be fairly
conservative about its interpretations of those fields and generally
stick to ones that are provided for communication between the filesystems
and the device, rather than ones that are private to filesystems
or private ones to the device interface which may be modified unexpectedly.
Specifically, b_cylinder, b_rawblkno, b_prio are provided for the
sorting algorithms of the disk buffer queues, but it's very reasonable
to consult other fields such as b_bcount the B_READ flag when
considering the queuing algorithms.
Also, when debugging, the actual values you look at in these
structures may be very specific to the context in which the bufq api
is being used. It's important to know how, when and what device
driver or interface is calling the bufq routines in order to interpret
meaning of these structures.
>
> 1) Inside the bufq_readprio.c, function:bufq_prio_put(struct
> bufq_state *bufq, struct buf *bp)
> if (bp->b_flags & B_READ) == B_READ){
> struct vnode *vptr = bp->b_vp;
> }
> How come in vptr= NULL always? Consequently, the uvm_object is also NULL??
> Aren't the requests are mapped-in to uvm pages?
Well, the b_vp field is for the use of the fileystems, and generally
isn't consulted by the device driver. A good explanation of it being
NULL is that the device driver, or a pseudo device between it and
the filesystem has allocated a "nested" io buffer for its purposes
that only has a pointer to the original buffer. In that case it wouldn't
bother filling in fields that it does not use.
>
> 2) For if (bp->b_flags & B_READ) == B_READ)
>
> I printed the following fields in struct buf *bp;
> printf ("\n bufsize [%d] bcount [%d] bres_id [%d] Address = [%llu]",
> bp->b_bufsize,
> bp->b_bcount,
> bp->b_resid,
> PTRTOUINT64(bp->b_saveaddr))
>
> bufsize [8192] bcount [8192] bres_id [0] Address = [134541312]
> bufsize [8192] bcount [8192] bres_id [0] Address = [134541312]
> bufsize [8192] bcount [8192] bres_id [27226] Address = [134541312]
> bufsize [8192] bcount [8192] bres_id [27226] Address = [134541312]
> bufsize [8192] bcount [8192] bres_id [47544] Address = [134541312]
> bufsize [8192] bcount [8192] bres_id [47544] Address = [134541312]
>
> How come the "bres" values are consequently the same {0,27226,4744}.
> Isn't bufq_prio_put() supposed to be called with different values of
> buffer addresses?
> Also, what does bres really imply? In the code it says " /* Remaining I/O. */"
The b_resid is the number of bytes of uncompleted io in the buffer.
However, it may be modified by the device driver, including from
interrupts, outside of the control of your bufq interface. Therefore
this is not a field you probably should be consulting in your buffer
queue. You might be able to get away with consulting it if you're
careful about when and maybe pay attention to the interlock and flags.
Someone else on this list may be able offer an opinion on this.
One more note about b_saveaddr. This is used by the layer above the
device driver to save user space addresses from b_data when mapping
into kernel space. It is unlikely to have useful meaning at the
device driver level.
Finally, you may wish to look at the vfs_buf_print or the "show buf"
ddb command as an aid to examining struct buf fields when debugging.
I hope these notes help. Good luck.
Darrin