tech-toolchain archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash
Some time ago, David Holland wrote:
This sounds like it is overwriting its stack, probably in the mem_mib
call. Then when it returns form the mem_mib call it manages to go to
the wrong place. Can you check in the debugger if this is the case?
What gets trashed if you overwrite the stack can depend heavily on
compiler optimizations, so it's not necessarily a gcc bug.
I don't see anything obviously wrong with the code, but that isn't
conclusive.
Also, is this happening on real i386, or in a 32-bit chroot on an
amd64? Might also be a problem with the compat32 sysctl().
I have reproduced this on NetBSD 7.1 on a real i386 machine.
The problem appears to be a compiler bug. Consider the following code,
from the middle of netsnmp_cpu_arch_load:
for (i = 0; i < cpu_num; i++) {
netsnmp_cpu_info *ncpu = netsnmp_cpu_get_byIdx( i, 1 );
size_t j = i * CPUSTATES;
ncpu->user_ticks = (unsigned long long)ncpu_stats[j + CP_USER];
ncpu->nice_ticks = (unsigned long long)ncpu_stats[j + CP_NICE];
ncpu->sys2_ticks = (unsigned long long)ncpu_stats[j +
CP_SYS]+cpu_stats[j + CP_INTR];
ncpu->kern_ticks = (unsigned long long)ncpu_stats[j + CP_SYS];
ncpu->idle_ticks = (unsigned long long)ncpu_stats[j + CP_IDLE];
ncpu->intrpt_ticks = (unsigned long long)ncpu_stats[j +
CP_INTR];
}
This is translated into the following block of code (disassembled by
gdb). The block is entered via a conditional branch from elsewhere, if
cpu_num > 0.
0xbba64c88 <+1039>: movl $0x1,0x4(%esp)
0xbba64c90 <+1047>: movl $0x0,(%esp)
0xbba64c97 <+1054>: call 0xbba09460 <netsnmp_cpu_get_byIdx@plt>
0xbba64c9c <+1059>: mov (%edi),%edx
0xbba64c9e <+1061>: mov 0x4(%edi),%ecx
0xbba64ca1 <+1064>: mov %edx,0x2008(%eax)
0xbba64ca7 <+1070>: mov %ecx,0x200c(%eax)
0xbba64cad <+1076>: mov 0x8(%edi),%edx
0xbba64cb0 <+1079>: mov 0xc(%edi),%ecx
0xbba64cb3 <+1082>: mov %edx,0x2010(%eax)
0xbba64cb9 <+1088>: mov %ecx,0x2014(%eax)
0xbba64cbf <+1094>: mov 0x10(%edi),%edx
0xbba64cc2 <+1097>: mov 0x14(%edi),%ecx
0xbba64cc5 <+1100>: add 0x54(%esp),%edx
0xbba64cc9 <+1104>: adc 0x58(%esp),%ecx
0xbba64ccd <+1108>: mov %edx,0x2068(%eax)
0xbba64cd3 <+1114>: mov %ecx,0x206c(%eax)
0xbba64cd9 <+1120>: mov 0x10(%edi),%edx
0xbba64cdc <+1123>: mov 0x14(%edi),%ecx
0xbba64cdf <+1126>: mov %edx,0x2030(%eax)
0xbba64ce5 <+1132>: mov %ecx,0x2034(%eax)
0xbba64ceb <+1138>: mov 0x20(%edi),%edx
0xbba64cee <+1141>: mov 0x24(%edi),%ecx
0xbba64cf1 <+1144>: mov %edx,0x2020(%eax)
0xbba64cf7 <+1150>: mov %ecx,0x2024(%eax)
0xbba64cfd <+1156>: mov 0x18(%edi),%edx
0xbba64d00 <+1159>: mov 0x1c(%edi),%ecx
0xbba64d03 <+1162>: mov %edx,0x2038(%eax)
0xbba64d09 <+1168>: mov %ecx,0x203c(%eax)
0xbba64d0f <+1174>: mov -0x258(%ebx),%eax
0xbba64d15 <+1180>: mov (%eax),%eax
0xbba64d17 <+1182>: cmp $0x1,%eax
0xbba64d1a <+1185>: jle 0xbba64ace <netsnmp_cpu_arch_load+597>
0xbba64d20 <+1191>: movl $0x1,0x4(%esp)
0xbba64d28 <+1199>: movl $0x1,(%esp)
0xbba64d2f <+1206>: call 0xbba09460 <netsnmp_cpu_get_byIdx@plt>
0xbba64d34 <+1211>: mov 0x28(%edi),%edx
0xbba64d37 <+1214>: mov 0x2c(%edi),%ecx
0xbba64d3a <+1217>: mov %edx,0x2008(%eax)
0xbba64d40 <+1223>: mov %ecx,0x200c(%eax)
0xbba64d46 <+1229>: mov 0x30(%edi),%esi
0xbba64d49 <+1232>: mov 0x34(%edi),%edi
0xbba64d4c <+1235>: mov %esi,0x2010(%eax)
0xbba64d52 <+1241>: mov %edi,0x2014(%eax)
The branch to 0xbba64ace is a branch back to continue the normal
execution of the code, where free(...) is called and life carries on.
Note that the compiler appears to have partially unrolled the loop. But
this is the end of that block of code. The next block of code happens to
be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl
vm.vm_meter failed". This appears to be purely coincidental, and the
real failure here is that execution just falls off the end of this
half-finished loop unrolling.
0xbba64d58 <+1247>: call 0xbba0abf0 <__errno@plt>
0xbba64d5d <+1252>: mov (%eax),%eax
0xbba64d5f <+1254>: mov %eax,0x8(%esp)
0xbba64d63 <+1258>: lea -0x41e78(%ebx),%eax
0xbba64d69 <+1264>: mov %eax,0x4(%esp)
0xbba64d6d <+1268>: movl $0x3,(%esp)
0xbba64d74 <+1275>: call 0xbba0af70 <snmp_log@plt>
0xbba64d79 <+1280>: jmp 0xbba649cd <netsnmp_cpu_arch_load+340>
It does look like a machine with only one CPU would be spared this fate
as it would exit the loop after the first iteration and not try to
execute the second, incomplete, iteration. This problem should be
reproducible on any NetBSD/i386 machine with at least 2 CPUs.
Obviously in the short term, the package will need to work around this
by disabling optimisation, but this is clearly something the compiler is
getting wrong.
Home |
Main Index |
Thread Index |
Old Index