tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
in_cksum (Re: CVS commit: [yamt-lazymbuf] src/sys/arch/amd64/include)
> On Mon, Feb 04, 2008 at 07:39:23PM +0900, YAMAMOTO Takashi wrote:
> > > On Thu, Jan 24, 2008 at 12:11:42PM +0900, YAMAMOTO Takashi wrote:
> > > > are you suggesting to sprinkle manual map calls instead of
> > > > "automatic" one in mtod (and its equivalent in asm code)?
> > >
> > > Yes, I would strongly prefer to do this in the high-level C code instead
> > > of touching the assembly. If mtod does that, that's ok, but I would
> > > prefer if the MD in_cksum backend does not have to deal with it.
> > >
> > > Joerg
> >
> > if we go that route, isn't it better to remove mbuf knowledge from
> > the MD backend completely? ie. like linux.
>
> The MD backend only needs a very limited understanding of mbufs:
> - m_len
> - m_data
> - m_next
yes.
> Removing that would mean one function call per mbuf, possible register
> savings and complications for handling misaligned / odd length mbufs.
> That in turn would likely have a measurable performance effect.
>
> Joerg
does it really matter?
i think the fetch-and-add part is dominant for performance.
btw, according to regress/sys/net/in_cksum, i386 asm version
(cpu_in_cksum.S 1.2) seems slower than portable version
on my cpu.
YAMAMOTO Takashi
nfskuro% dmesg|grep cpu0
cpu0 at mainbus0 apid 0: (boot processor)
cpu0: Intel (686-class), 2793.06 MHz, id 0xf41
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 641d<SSE3,DTES64,MONITOR,DS-CPL,CID,CX16,xTPR>
cpu0: features3 20100000<XD,EM64T>
cpu0: "Intel(R) Xeon(TM) CPU 2.80GHz"
cpu0: I-cache 12K uOp cache 8-way
cpu0: L2 cache 1 MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: using thermal monitor 1
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 32 page colors
nfskuro% ./in_cksum 1 1 10000 1 1 1
portable version: 0.000484
test version: 0.000326
relative time: 67%
nfskuro% ./in_cksum 1 1 10000 40
portable version: 0.000147
test version: 0.000270
relative time: 182%
nfskuro% ./in_cksum 1 1 10000 50
portable version: 0.000000
test version: 0.001749
relative time: 174900%
nfskuro% ./in_cksum 1 1 10000 60
portable version: 0.000162
test version: 0.000360
relative time: 221%
nfskuro% ./in_cksum 1 1 10000 1600
portable version: 0.004768
test version: 0.014617
relative time: 307%
nfskuro% ./in_cksum 1 1 10000 9000
portable version: 0.020984
test version: 0.082724
relative time: 394%
nfskuro% ./in_cksum 1 1 10000 1500 1500 40
portable version: 0.003678
test version: 0.032719
relative time: 889%
nfskuro%
Home |
Main Index |
Thread Index |
Old Index