tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: VOP_GETATTR: locking protocol change proposal



hi,

> On Nov 22, 2011, at 2:17 AM, YAMAMOTO Takashi wrote:
> 
>> hi,
>> 
>>> On Fri, Nov 18, 2011 at 06:31:21AM +0000, YAMAMOTO Takashi wrote:
>>>>>> postgresql assumes instant lseek(SEEK_END) to get the size of
>>>>>> their heap files.
>>>>>> 
>>>>>>  http://rhaas.blogspot.com/2011/11/linux-lseek-scalability.html
>>>>>> 
>>>>>> as fsync etc keeps the vnode lock during i/o, it might cause severe
>>>>>> performance regression.
>>>>> 
>>>>> I wonder if it is worth having a separate VOP for that, which would
>>>>> retrieve a subset of vattr without lock held.  There are potentially
>>>>> more uses in the tree.
>>>> 
>>>> while it's good to have VOP_GETATTR take a mask to specify a set of
>>>> requested attributes, i don't think it's worth to have a serparete VOP
>>>> for this.  (PR/30250 is related)
>>>> 
>>>> IMO we should make unlocked VOP calls safe (against revoke and umount -f)
>>>> and put VOP_GETATTR locking back into filesystem internal.
>>> 
>>> ad looked into that and concluded it was prohibitively expensive; too
>>> much contention on every entry/exit.
>> 
>> what's "it"?
>> 
>> i agree naive reference counting on each VOP calls can be too expensive.
> 
> If "it" looked like the attached sketch, are two atomic adds per call too
> expensive?  Could even make it cheaper if VOP_LOCK() only takes the pre-op
> block, VOP_UNLOCK() only takes the post-op block and VOPs with a locked vnode
> take none of them making VOP_LOCK, VOP_XXX ... VOP_UNLOCK sequences atomic
> with regard to revoke.

it doesn't seem to be much cheaper than a normal rwlock, which would involve
the same number of atomic ops in uncontended cases.

YAMAMOTO Takashi

> 
> --
> Juergen Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig 
> (Germany)
> 
> #define VWANTREVOKE     0x40000000
> 
> revoke()
>         ...
>         oc = atomic_add_32_nv(&vp->v_opcnt, VWANTREVOKE);
>         while (oc != VWANTREVOKE) {
>                 cv_wait(&vp->v_cv, vp->v_interlock);
>                 oc = vp->v_opcnt;
>         }
>         ...
>         atomic_add_32(&vp->v_opcnt, -VWANTREVOKE);
>         cv_broadcast(&vp->v_cv);
> 
> 
> vop_xxx()
>         ...
>         /* pre-op */
>       for (;;) {
>                 oc = atomic_add_32_nv(&vp->v_opcnt, 1);
>                 if (predict_true((oc & VWANTREVOKE) == 0)
>                         break;
>                 oc = atomic_add_32_nv(&vp->v_opcnt, -1);
>                 mutex_enter(&vp->v_interlock);
>                 while (oc & VWANTREVOKE) {
>                         cv_wait(&vp->v_cv, vp->v_interlock);
>                         oc = vp->v_opcnt;
>                 }
>                 mutex_exit(&vp->v_interlock);
>         }
>         ...
>         VCALL(...)
>         ...
>         /* post-op */
>       oc = atomic_add_32_nv(&vp->v_opcnt, -1);
>         if (predict_false(oc == VWANTREVOKE)) {
>                 mutex_enter(&vp->v_interlock);
>                 cv_broadcast(&vp->v_cv);
>                 mutex_exit(&vp->v_interlock);
>         }


Home | Main Index | Thread Index | Old Index