tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: notes from running will-it-scale
Very interesting, particularly the outrageous assembly for
pmap_{zero,copy}_page().
Is there some way to tell the compiler that the address is already
4096-aligned and avoid the conditionals? Failing that, we could just
adopt the FreeBSD assembly for this.
Does anyone see a problem with introducing a vfs.timestamp_precision
to avoid the rtdscp?
Jaromir
Le dim. 19 juil. 2020 à 13:21, Mateusz Guzik <mjguzik%gmail.com@localhost> a écrit :
>
> Hello,
>
> I recently took an opportunity to run cross-systems microbenchmarks
> with will-it-scale and included NetBSD (amd64).
>
> https://people.freebsd.org/~mjg/freebsd-dragonflybsd-netbsd-v2.txt
> [no linux in this doc, I will probably create a new one soon(tm)]
>
> The system has a lot of problems in the vfs layer, vm is a mixed bag
> with multithreaded cases lagging behind and some singlethreaded being
> pretty good (and at least one winning against the other systems).
>
> Notes:
> - rtdscp is very expensive in vms, yet the kernel unconditionally
> performs by calling vfs_timestamp. Both FreeBSD and DragonflyBSD have
> a knob to change the resolution (and consequently avoid the
> instruction), I think you should introduce it and default to less
> accuracy on vms. Sample results:
> stock pipe1: 2413901
> patched pipe1: 3147312
> stock vfsmix: 13889
> patched vfsmix: 73477
> - sched_yield is apparently a nop when the binary is not linked with
> pthread. this does not match other systems and is probably a bug.
> - pmap_zero_page/pmap_copy_page compile to atrocious code which keeps
> checking for alignment. The compiler does not know what values can be
> assigned to pmap_direct_base and improvises.
>
> 0xffffffff805200c3 <+0>: add 0xf93b46(%rip),%rdi #
> 0xffffffff814b3c10 <pmap_direct_base>
> 0xffffffff805200ca <+7>: mov $0x1000,%edx
> 0xffffffff805200cf <+12>: xor %eax,%eax
> 0xffffffff805200d1 <+14>: test $0x1,%dil
> 0xffffffff805200d5 <+18>: jne 0xffffffff805200ff <pmap_zero_page+60>
> 0xffffffff805200d7 <+20>: test $0x2,%dil
> 0xffffffff805200db <+24>: jne 0xffffffff8052010b <pmap_zero_page+72>
> 0xffffffff805200dd <+26>: test $0x4,%dil
> 0xffffffff805200e1 <+30>: jne 0xffffffff80520116 <pmap_zero_page+83>
> 0xffffffff805200e3 <+32>: mov %edx,%ecx
> 0xffffffff805200e5 <+34>: shr $0x3,%ecx
> 0xffffffff805200e8 <+37>: rep stos %rax,%es:(%rdi)
> 0xffffffff805200eb <+40>: test $0x4,%dl
> 0xffffffff805200ee <+43>: je 0xffffffff805200f1 <pmap_zero_page+46>
> 0xffffffff805200f0 <+45>: stos %eax,%es:(%rdi)
> 0xffffffff805200f1 <+46>: test $0x2,%dl
> 0xffffffff805200f4 <+49>: je 0xffffffff805200f8 <pmap_zero_page+53>
> 0xffffffff805200f6 <+51>: stos %ax,%es:(%rdi)
> 0xffffffff805200f8 <+53>: and $0x1,%edx
> 0xffffffff805200fb <+56>: je 0xffffffff805200fe <pmap_zero_page+59>
> 0xffffffff805200fd <+58>: stos %al,%es:(%rdi)
> 0xffffffff805200fe <+59>: retq
> 0xffffffff805200ff <+60>: stos %al,%es:(%rdi)
> 0xffffffff80520100 <+61>: mov $0xfff,%edx
> 0xffffffff80520105 <+66>: test $0x2,%dil
> 0xffffffff80520109 <+70>: je 0xffffffff805200dd <pmap_zero_page+26>
> 0xffffffff8052010b <+72>: stos %ax,%es:(%rdi)
> 0xffffffff8052010d <+74>: sub $0x2,%edx
> 0xffffffff80520110 <+77>: test $0x4,%dil
> 0xffffffff80520114 <+81>: je 0xffffffff805200e3 <pmap_zero_page+32>
> 0xffffffff80520116 <+83>: stos %eax,%es:(%rdi)
> 0xffffffff80520117 <+84>: sub $0x4,%edx
> 0xffffffff8052011a <+87>: jmp 0xffffffff805200e3 <pmap_zero_page+32>
>
> The thing to do in my opinion is to just provide dedicated asm funcs.
> This is the equivalent on FreeBSD (ifunc'ed):
>
> ENTRY(pagezero_std)
> PUSH_FRAME_POINTER
> movl $PAGE_SIZE/8,%ecx
> xorl %eax,%eax
> rep
> stosq
> POP_FRAME_POINTER
> ret
> END(pagezero_std)
>
> ENTRY(pagezero_erms)
> PUSH_FRAME_POINTER
> movl $PAGE_SIZE,%ecx
> xorl %eax,%eax
> rep
> stosb
> POP_FRAME_POINTER
> ret
> END(pagezero_erms)
>
> --
> Mateusz Guzik <mjguzik gmail.com>
>
Home |
Main Index |
Thread Index |
Old Index