tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: RFC: gif(4) MP-ify
Hi,
Thank you for comments. I update the patch.
On 2015/12/25 17:49, Kengo NAKAHARA wrote:
> I MP-ify gif(4) interface and ip_encap which is required by gif(4).
>
> Here is the patch
> http://www.netbsd.org/~knakahara/gif-mp-ify/gif-mp-ify.patch
Here is the updated patch.
http://www.netbsd.org/~knakahara/gif-mp-ify/gif-mp-ify-2.patch
> christos@n.o
I use rw_init() for gif_softc_list_lock instead of rw_obj_alloc().
However, I still use rw_obj_alloc() for struct gif_softc.gif_lock.
If I use rw_init() for struct gif_softc.gif_lock, it would be the
following code
====================
struct gif_softc {
struct ifnet gif_if; /* common area - must be at the top */
// snip
void *gif_si; /* softintr handle */
krwlock_t gif_lock __aligned(COHERENCY_UNIT); /* lock for softc */
int dummy __aligned(COHERENCY_UNIT); /* padding for cache line */
};
====================
I feel it seems worse...
> riastradh@n.o
I agree pserialize(9) gives it better MP-scalable than rwlock. However,
I think it may be a premature optimization. The reason I think so is
the following lockstat result of my kernel (which is enable both
NET_MPSAFE and LOCKDEBUG).
====================
(doing "ping -f" over two gif(4) interfaces)
# lockstat sleep 100
Elapsed time: 100.03 seconds.
-- Adaptive mutex spin
Total% Count Time/ms Lock Caller
------ ------- --------- ---------------------- ------------------------------
55.53 162 1.65 xc_low_pri <all>
46.32 133 1.38 xc_low_pri cv_wait+11e
8.26 24 0.25 xc_low_pri xc_wait+5f
0.95 5 0.03 xc_low_pri xc_thread+e4
39.65 195 1.18 fffffe827fba2f40 <all>
13.78 53 0.41 fffffe827fba2f40 frag6_fasttimo+10
6.52 24 0.19 fffffe827fba2f40 tcp_slowtimo+10
6.41 20 0.19 fffffe827fba2f40 ip_slowtimo+10
4.93 14 0.15 fffffe827fba2f40 nd6_timer+37
4.74 15 0.14 fffffe827fba2f40 frag6_slowtimo+11
1.81 49 0.05 fffffe827fba2f40 gifintr+172
1.47 20 0.04 fffffe827fba2f40 ipintr+2a
4.04 4 0.12 fffffe810f0667d0 sme_events_worker+3b
0.41 3 0.01 fffffe810ee46d40 <all>
0.39 1 0.01 fffffe810ee46d40 soo_kqfilter+1f
0.01 1 0.00 fffffe810ee46d40 sosend+2f7
0.01 1 0.00 fffffe810ee46d40 filt_solisten+2c
0.27 5 0.01 fffffe810f18d080 <all>
0.25 4 0.01 fffffe810f18d080 workqueue_enqueue+5c
0.01 1 0.00 fffffe810f18d080 cv_wait+11e
0.10 2 0.00 fffffe810f18d400 workqueue_enqueue+5c
-- Adaptive mutex sleep
Total% Count Time/ms Lock Caller
------ ------- --------- ---------------------- ------------------------------
100.00 6 0.37 xc_low_pri cv_wait+11e
-- Spin mutex spin
Total% Count Time/ms Lock Caller
------ ------- --------- ---------------------- ------------------------------
67.79 447 1.48 fffffe810ee46f80 <all>
61.94 415 1.36 fffffe810ee46f80 wm_nq_start+42
5.85 32 0.13 fffffe810ee46f80 wm_txintr_msix+54
15.21 733 0.33 fffffe810ee46e00 <all>
7.93 393 0.17 fffffe810ee46e00 ifq_enqueue+39
7.28 340 0.16 fffffe810ee46e00 wm_nq_start+113
10.14 8 0.22 fffffe827f744400 turnstile_lookup+22
3.83 36 0.08 fffffe827f726dc0 <all>
3.43 32 0.08 fffffe827f726dc0 cv_wait+b0
0.40 4 0.01 fffffe827f726dc0 cv_wakeup_all+6f
1.99 121 0.04 fffffe827f7173b0 <all>
1.93 119 0.04 fffffe827f7173b0 pool_cache_get_slow+bd
0.06 2 0.00 fffffe827f7173b0 pool_cache_put_slow+10b
0.27 10 0.01 fffffe827f7163f0 <all>
0.17 7 0.00 fffffe827f7163f0 pool_cache_get_slow+bd
0.10 3 0.00 fffffe827f7163f0 pool_cache_put_slow+10b
0.20 1 0.00 fffffe827f7235c0 sleepq_remove+103
0.13 2 0.00 uvm_fpageqlock <all>
0.12 1 0.00 uvm_fpageqlock uvm_pageidlezero+1fb
0.01 1 0.00 uvm_fpageqlock uvm_pagealloc_strat+c3
0.12 2 0.00 fffffe827f726880 cv_wakeup_one+6d
0.10 1 0.00 fffffe827f7237c0 sleepq_remove+103
0.10 1 0.00 fffffe827f723ac0 sleepq_remove+103
0.07 2 0.00 fffffe810fea4d78 knote_activate+23
0.05 1 0.00 fffffe827f726a40 cv_wakeup_all+6f
====================
wm_nq_start() have a much greater impact than gif* functions. I think
wm_nq_start() MP-ify, that is, TX multiqueue should be done next.
# I think it must be my next task :)
So, I'd like to commit above rwlock version patch to measure the
scalability in various workloads. And then, replacing to pserialize(9)
will be begun when the harmful effect of reader lock become obvious.
Could you comment above patch?
If there is no objection, I'd like to commit after a few days or weeks.
Thanks,
--
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.
Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit
Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>
Home |
Main Index |
Thread Index |
Old Index