Subject: Re: VM hangs with latest bits?
To: Bill Sommerfeld <sommerfeld@orchard.medford.ma.us>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: current-users
Date: 02/24/1997 14:15:28
On Mon, 24 Feb 1997 16:59:41 GMT
Bill Sommerfeld <sommerfeld@orchard.medford.ma.us> wrote:
> Anyone else seeing severe VM system hangs with the today's bits?
"Yes." It was killing my SS2.
> symptom is: lockup which starts when starting memory hogs (larger X
> clients, emacs, ..). A bunch of processes appear to be hung in
... ld on a debugging kernel.... :-)
> .. which appears to be "wait for pageout daemon to clean pages" logic.
Yes... So, the problem was deadlock between pagedaemon and the new,
more aggressive vm_object_collapse() code.. Essentially, a pageout
could trigger a collapse, but the collapse code would see that paging
was in progress and wait, thus causing deadlock. I've committed the
following patch from Charles Hannum which has "fixed" the problem.
(XXX - the whole situation that causes this deadlock really needs to
be avoided in the first place, but this is a "good enough" stopgap.)
Jason R. Thorpe thorpej@nas.nasa.gov
NASA Ames Research Center Home: 408.866.1912
NAS: M/S 258-6 Work: 415.604.0935
Moffett Field, CA 94035 Pager: 415.428.6939
diff -rc2 t/vm_object.c ./vm_object.c
*** t/vm_object.c Sat Feb 22 17:39:46 1997
--- ./vm_object.c Sun Feb 23 15:22:25 1997
***************
*** 1189,1197 ****
--- 1189,1206 ----
* we're deleting. We'll never notice this case, because the
* backing object we're deleting won't have the page.
+ *
+ * XXXXX FIXME
+ * Because pagedaemon can call vm_object_collapse(), we must *not*
+ * sleep waiting for pages.
*/
vm_object_unlock(object);
RetryRename:
+ #if 0 /* XXXXX FIXME */
vm_object_paging_wait(backing_object);
+ #else
+ if (vm_object_paging(backing_object))
+ goto fail;
+ #endif
/*
* While we were asleep, the parent object might have been deleted. If
***************
*** 1313,1320 ****
--- 1322,1333 ----
paged_offset);
if (backing_page == NULL) {
+ #if 0 /* XXXXX FIXME */
vm_object_unlock(backing_object);
VM_WAIT;
vm_object_lock(backing_object);
goto RetryRename;
+ #else
+ goto fail;
+ #endif
}
***************
*** 1341,1344 ****
--- 1354,1363 ----
}
+ #ifdef DIAGNOSTIC
+ if (rv != VM_PAGER_OK)
+ panic("vm_object_overlay: pager returned %d",
+ rv);
+ #endif
+
/*
* The pager might have moved the page while we
***************
*** 1434,1439 ****
*/
if (vm_object_paging(backing_object) ||
! backing_object->pager != NULL)
goto fail;
/*
--- 1453,1460 ----
*/
if (vm_object_paging(backing_object) ||
! backing_object->pager != NULL) {
! vm_object_unlock(object);
goto fail;
+ }
/*
***************
*** 1465,1468 ****
--- 1486,1490 ----
* Page still needed. Can't go any further.
*/
+ vm_object_unlock(object);
goto fail;
}
***************
*** 1733,1740 ****
return;
! iprintf(pr, "Object 0x%lx: size=0x%lx, res=%d, ref=%d, ",
(long) object, (long) object->size,
! object->resident_page_count, object->ref_count);
! (*pr)("pager=0x%lx+0x%lx, shadow=(0x%lx)+0x%lx\n",
(long) object->pager, (long) object->paging_offset,
(long) object->shadow, (long) object->shadow_offset);
--- 1755,1764 ----
return;
! iprintf(pr, "Object 0x%lx: size=0x%lx, res=%d, ref=%d, flags=0x%x, ",
(long) object, (long) object->size,
! object->resident_page_count, object->ref_count,
! object->flags);
! (*pr)("pip=%d, pager=0x%lx+0x%lx, shadow=(0x%lx)+0x%lx\n",
! object->paging_in_progress,
(long) object->pager, (long) object->paging_offset,
(long) object->shadow, (long) object->shadow_offset);