Hi! As there is an amazing amount of work being done on and for VAX these days, I'd like to collect known issues / TODOs and debugging details. Maybe we'd add a page to the NetBSD wiki[1]? (How do people get edit access?) Right now, I see these major topics: * Document swap / ulimit requirements to have a successful local build, for some usual real VAX systems as well as a 512 MB equipped SIMH VAX. * Get GCC 12 up'n'running. (Untested, maybe Kalvis already has some patches. Whatever we find should be upstreamed! That's true also for the Binutils bits. I think VAX's native 64bit support didn't yet arrive?) * Get pkgsrc's current Python up'n'running. (VAX FP got removed, needs to be added back and maybe a maintainer needs to step up?) * Fix timekeeping issues. Esp. for the timekeeping issues, I've been testing a lot with a 4000/90 (I falsely claimed this system to be a /96, but that was wrong---my /96 has a dead Dallas clock chip and is waiting for a repair) and a 4000/60. My findings so far is that both systems, bootet with a GENERIC kernel, behave quite the same: * Both ntpd/ntpdate are disabled. * No notworking. * Booting off local emulated SCSI disks (PiSCSI), installed locally from PiSCSI-emulated install ISOs. * Both system loose about 2 to 4 seconds per day. * This loss does not change, whether * the system is idle; or * the system is CPU-loaded (running GCC in a loop); or * the system has I/O load (`cat`ting all regular files to /dev/null in a loop, with the FS being on the PiSCSI-emulated disk.) * So both, the 4000/60 and /90 have a reasonable stable time. * Booted with a slightly older image (Jun 6th), I see no unusual timekeeping-related messages; booting with a more recent image (g:33d45195d8dbc05843af2d76d66a83970b802c30, Fri Dec 22 17:55:49 2023 +0000), I seem to always get _one_ of these (on both the /60 /90) during boot: [ 1.048131] WARNING: lwp 30 (system rt_timer) flags 0x20000000: timecounter went backwards from (1 + 0x3462e4d1a64b88fb/2^64) sec to (1 + 0x0cd08919ef941f4f/2^64) sec in netbsd:mi_switch+0x4d But I didn't see a simila message ever again, not while the system is idle, and also not while being CPU-loaded, nor with lots of I/O. There are about 2850 commits between Jun 6th and "today", I don't have a clue whether or not bisecting it down would be helpful at all, or if it's just a red herring... Thinking about it, I did a `git blame` and found: 36a17127078db (ad 2007-10-08 20:06:17 +0000 505) void 949e16d902d16 (yamt 2007-12-22 01:14:53 +0000 506) updatertime(lwp_t *l, const struct bintime *now) f03010953f572 (yamt 2007-05-17 14:51:11 +0000 507) { 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 508) static bool backwards = false; f03010953f572 (yamt 2007-05-17 14:51:11 +0000 509) f70325ee02948 (rmind 2009-03-28 21:43:16 +0000 510) if (__predict_false(l->l_flag & LW_IDLE)) f03010953f572 (yamt 2007-05-17 14:51:11 +0000 511) return; f03010953f572 (yamt 2007-05-17 14:51:11 +0000 512) 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 513) if (__predict_false(bintimecmp(now, &l->l_stime, <)) && !backwards) { 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 514) char caller[128]; 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 515) 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 516) #ifdef DDB 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 517) db_symstr(caller, sizeof(caller), 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 518) (db_expr_t)(intptr_t)__builtin_return_address(0), 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 519) DB_STGY_PROC); 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 520) #else 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 521) snprintf(caller, sizeof(caller), "%p", 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 522) __builtin_return_address(0)); 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 523) #endif 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 524) backwards = true; 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 525) printf("WARNING: lwp %ld (%s%s%s) flags 0x%x:" 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 526) " timecounter went backwards" 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 527) " from (%jd + 0x%016"PRIx64"/2^64) sec" 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 528) " to (%jd + 0x%016"PRIx64"/2^64) sec" 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 529) " in %s\n", 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 530) (long)l->l_lid, 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 531) l->l_proc->p_comm, 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 532) l->l_name ? " " : "", 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 533) l->l_name ? l->l_name : "", 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 534) l->l_pflag, 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 535) (intmax_t)l->l_stime.sec, l->l_stime.frac, 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 536) (intmax_t)now->sec, now->frac, 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 537) caller); 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 538) } 589aa4ee5837b (riastradh 2023-07-13 13:33:55 +0000 539) 949e16d902d16 (yamt 2007-12-22 01:14:53 +0000 540) /* rtime += now - stime */ 949e16d902d16 (yamt 2007-12-22 01:14:53 +0000 541) bintime_add(&l->l_rtime, now); 949e16d902d16 (yamt 2007-12-22 01:14:53 +0000 542) bintime_sub(&l->l_rtime, &l->l_stime); f03010953f572 (yamt 2007-05-17 14:51:11 +0000 543) } Argh... So it's probably just that we now _see_ that something went backwards---we just didn't get informed about it previously... It seems I'm unable to reproduce the timekeeping issues, at least not with a non-networked system. I'll bring one of the two systems downstairs and put it on wired network and start ntpdate / ntpd. I'm highly interested in other people's statement about their setups! Along with other people's impressions, I really think we'd publically collect these individual facts so that others don't need to test the very same setp. MfG, JBG [1] https://wiki.netbsd.org/ --
Attachment:
signature.asc
Description: PGP signature