netbsd-bugs: kern/31003: umass(4) panic provoked by Plextor portable hard disk drive

Subject: kern/31003: umass(4) panic provoked by Plextor portable hard disk drive
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <igor@string1.ciencias.uniovi.es>
List: netbsd-bugs
Date: 08/16/2005 23:24:00
>Number:         31003
>Category:       kern
>Synopsis:       umass(4) panic provoked by Plextor portable hard disk drive
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 16 23:24:00 +0000 2005
>Originator:     Igor Sobrado
>Release:        NetBSD 2.0.2
>Organization:
	University of Oviedo
>Environment:
System: NetBSD altair.v6.local 2.0.2 NetBSD 2.0.2 (GENERIC_LAPTOP) #0: Wed Mar 23 08:59:09 UTC 2005 jmc@faith.netbsd.org:/home/builds/ab/netbsd-2-0-2-RELEASE/i386/200503220140Z-obj/home/builds/ab/netbsd-2-0-2-RELEASE/src/sys/arch/i386/compile/GENERIC_LAPTOP i386
Architecture: i386
Machine: i386
>Description:
	The Plextor portable hard disk drive PX-PH08U is a member of a new
	family of USB mass storage devices.  The PX-PH08U is a 80 GB,
	2.5 inch, hard disk drive in an external USB enclosure.  It does
	not require an external power supply unit; as a consequence, the
	disk is turned off as soon as the transition to suspend mode is
	honored.  I suspect that this fact can be related with the problem
	outlined in this PR.

	Set up used:

	The PX-PH08U portable hard disk drive is a USB 2.0 device connected
	to a Dell Latitude CPi R400GT laptop (BIOS rev. A14) on its USB 1.1
	port.  This laptop is running NetBSD 2.0.2 and has an internal
	20 GB Hitachi HDD (IC25N020ATDA04).  The Plextor portable hard disk
	drive is identified as an USB mass storage device:

	    Aug 11 12:52:26 altair /netbsd: umass0 at uhub0 port 1 configuration 1 interface 0
	    Aug 11 12:52:26 altair /netbsd: umass0: Plextor S.A./N.V. PLEXTOR PX-PH, rev 2.00/3.02, addr 2
	    Aug 11 12:52:26 altair /netbsd: umass0: using SCSI over Bulk-Only
	    Aug 11 12:52:26 altair /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target

	The PX-PH08U portable hard disk drive contains an UFS-2 filesystem
	on it:

	    altair# disklabel sd0
	    # /dev/rsd0d:
	    type: SCSI
	    disk: PX-PH08U/T3
	    label: 
	    flags:
	    bytes/sector: 512
	    sectors/track: 63
	    tracks/cylinder: 16
	    sectors/cylinder: 1008
	    cylinders: 155127
	    total sectors: 156368016
	    rpm: 5400
	    interleave: 1
	    trackskew: 0
	    cylinderskew: 0
	    headswitch: 0           # microseconds
	    track-to-track seek: 0  # microseconds
	    drivedata: 0 

	    4 partitions:
	    #        size    offset     fstype [fsize bsize cpg/sgs]
	     a: 156368016         0     4.2BSD   1024  8192 46936  # (Cyl.      0 - 155126)
	     c: 156368016         0     unused      0     0        # (Cyl.      0 - 155126)
	     d: 156368016         0     unused      0     0        # (Cyl.      0 - 155126)

	I usually mount this filesystem in /mnt:

	    altair# mount /dev/sd0a /mnt
	    altair# df -k
	    Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
	    /dev/wd0a       45903    19049     24558    43%    /
	    /dev/wd0f       31207    14472     15174    48%    /var
	    /dev/wd0e      370295   163482    188298    46%    /usr
	    /dev/wd0g    11476539   209415  10693297     1%    /home
	    /dev/wd0h      247007   113069    121587    48%    /usr/X11R6
	    /dev/wd0i       31207     4038     25608    13%    /usr/contrib
	    /dev/wd0j      986743        1    937404     0%    /usr/obj
	    /dev/wd0k     1973735   774224   1100824    41%    /usr/pkg
	    /dev/wd0l      349711   159174    173051    47%    /usr/pkgsrc
	    /dev/wd0m     1480391   693818    712553    49%    /usr/src
	    /dev/wd0n      986743   445625    491780    47%    /usr/xsrc
	    mfs:433         63959       27     60734     0%    /tmp
	    kernfs              1        1         0   100%    /kern
	    fdesc               1        1         0   100%    /dev
	    /dev/sd0a    73559093        1  69881137     0%    /mnt

	Description of the problem:

	When the computer goes into suspend mode (Fn+Suspend) the next messages
	are registered in /var/log/messages:

	    Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
	    Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED

	followed by the next error:

	    Aug 16 23:22:34 altair /netbsd: umass0: at uhub0 port 1 (addr 2) disconnected
	    Aug 16 23:22:34 altair /netbsd: sd0(umass0:0:0:0): generic HBA error
	    Aug 16 23:22:34 altair /netbsd: uvm_fault(0xc0601680, 0, 0, 1) -> 0xe

	Once rebooted, both the internal HDD filesystems and the portable
	hard disk drive filesystem must be checked for consistency.

	I have classified this PR as a critical high priority problem because
	both it can damage filesystems (in portable hard disk drives and
	other system disks as the filesystems cannot be cleanly unmounted)
	and it enters to the in-kernel debugger stopping the computer.
>How-To-Repeat:
	An easy activity to reproduce the problem is mounting a filesystem
	in the portable hard disk drive in a mounting point (e.g., /mnt)
	and request a suspend mode.
>Fix:
	As a temporary workaround, it is possible unmounting the portable
	hard disk drive when a client requests a suspend mode.  This action
	can be configured for the related apmd(8) transition in the files
	in /etc/apm.  It must be clear that this workaround cannot be
	considered a fix at all for production systems.