Subject: Re: kern/3249: More vm woes
To: None <augustss@cs.chalmers.se>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: netbsd-bugs
Date: 02/24/1997 15:53:23
On Mon, 24 Feb 1997 20:24:38 +0100 (MET) 
 Lennart Augustsson <augustss@cs.chalmers.se> wrote:

 > >Description:
 > 	The vm system is still shaky, sometimes the system hangs, sometimes
 > 	processes hang.

 > >How-To-Repeat:
 > 	Just try the command
 > 		tar zxvf file.tar.gz
 > 	Nothing happens.  If you interrupt it and do a ps you see:
 >   371   392     1   0 -18  0   296  104 thrd_s D    p4    0:00.01 tar 
 >	zxvf file.tar.gz
 >  
 > >Fix:
 > 	I have no idea.  If I had CVS access I would revert to the state
 > 	before the first object collapse changes were made and then wait
 > 	until things start working again.

This was caused by a deadlock condition that existed between the
object collapse code and the pageout daemon.  I committed the appended
patch from Charles Hannum a short bit ago.  It took care of the problems
like this that I was seeing.

BTW, I'm also running the SUP scanner right now so that the fix
will be available via SUP very shortly.

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939

diff -rc2 t/vm_object.c ./vm_object.c
*** t/vm_object.c	Sat Feb 22 17:39:46 1997
--- ./vm_object.c	Sun Feb 23 15:22:25 1997
***************
*** 1189,1197 ****
--- 1189,1206 ----
  	 *    we're deleting.  We'll never notice this case, because the
  	 *    backing object we're deleting won't have the page.
+ 	 *
+ 	 * XXXXX FIXME
+ 	 * Because pagedaemon can call vm_object_collapse(), we must *not*
+ 	 * sleep waiting for pages.
  	 */
  
  	vm_object_unlock(object);
  RetryRename:
+ #if 0 /* XXXXX FIXME */
  	vm_object_paging_wait(backing_object);
+ #else
+ 	if (vm_object_paging(backing_object))
+ 		goto fail;
+ #endif
  	/*
  	 * While we were asleep, the parent object might have been deleted.  If
***************
*** 1313,1320 ****
--- 1322,1333 ----
  			    paged_offset);
  			if (backing_page == NULL) {
+ #if 0 /* XXXXX FIXME */
  				vm_object_unlock(backing_object);
  				VM_WAIT;
  				vm_object_lock(backing_object);
  				goto RetryRename;
+ #else
+ 				goto fail;
+ #endif
  			}
  
***************
*** 1341,1344 ****
--- 1354,1363 ----
  			}
  
+ #ifdef DIAGNOSTIC
+ 			if (rv != VM_PAGER_OK)
+ 				panic("vm_object_overlay: pager returned %d",
+ 				    rv);
+ #endif
+ 
  			/*
  			 * The pager might have moved the page while we
***************
*** 1434,1439 ****
  	 */
  	if (vm_object_paging(backing_object) ||
! 	    backing_object->pager != NULL)
  		goto fail;
  
  	/*
--- 1453,1460 ----
  	 */
  	if (vm_object_paging(backing_object) ||
! 	    backing_object->pager != NULL) {
! 		vm_object_unlock(object);
  		goto fail;
+ 	}
  
  	/*
***************
*** 1465,1468 ****
--- 1486,1490 ----
  			 * Page still needed.  Can't go any further.
  			 */
+ 			vm_object_unlock(object);
  			goto fail;
  		}
***************
*** 1733,1740 ****
  		return;
  
! 	iprintf(pr, "Object 0x%lx: size=0x%lx, res=%d, ref=%d, ",
  		(long) object, (long) object->size,
! 		object->resident_page_count, object->ref_count);
! 	(*pr)("pager=0x%lx+0x%lx, shadow=(0x%lx)+0x%lx\n",
  	       (long) object->pager, (long) object->paging_offset,
  	       (long) object->shadow, (long) object->shadow_offset);
--- 1755,1764 ----
  		return;
  
! 	iprintf(pr, "Object 0x%lx: size=0x%lx, res=%d, ref=%d, flags=0x%x, ",
  		(long) object, (long) object->size,
! 		object->resident_page_count, object->ref_count,
! 		object->flags);
! 	(*pr)("pip=%d, pager=0x%lx+0x%lx, shadow=(0x%lx)+0x%lx\n",
! 	       object->paging_in_progress,
  	       (long) object->pager, (long) object->paging_offset,
  	       (long) object->shadow, (long) object->shadow_offset);