netbsd-bugs: bin/987: locate database corruption when file names contain 8-bit chars

Subject: bin/987: locate database corruption when file names contain 8-bit chars
To: None <gnats-admin@NetBSD.ORG>
From: Andreas Gustafsson <gson@araneus.pp.fi>
List: netbsd-bugs
Date: 04/24/1995 11:05:09

>Number:         987
>Category:       bin
>Synopsis:       locate database corruption when file names contain 8-bit chars
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 24 11:05:06 1995
>Originator:     Andreas Gustafsson <gson@araneus.pp.fi>
>Organization:
Araneus Information Systems Oy
>Release:        NetBSD-current snapshot of around March 17, 1995
>Environment:

System: NetBSD araneus.pp.fi 1.0A NetBSD 1.0A (GUANO) #3: Wed Apr 12 17:21:06 EET DST 1995 gson@araneus.pp.fi:/usr/src/sys/arch/i386/compile/GUANO i386

>Description:

When locate.updatedb is used to index a file system where some
file names contain characters whose character code is in the range
0x80 - 0x9F, it will build a corrupted /var/db/locate.database.

>How-To-Repeat:

1. Create a file with a funny enough name, e.g. using

    awk 'BEGIN { s = sprintf("foo%cbar", 132); print > s }' </dev/null

2. Run locate.updatedb

3. Try to find some other files using locate and watch it fail to 
   find some of them and/or print garbage pathnames (depending 
   on the relative positions of the files in the database, in 
   a rather unpredictable manner).

>Fix:

Not one, but two.  First, a minimal fix which I think reflects the
intent of the original programmer:

*** locate.code.c.orig	Thu Dec 22 14:35:10 1994
--- locate.code.c	Sun Apr 23 16:35:59 1995
***************
*** 144,150 ****
  		for (cp = path; *cp != NULL; cp++) {
  			if ((u_char)*cp >= PARITY)
  				*cp &= PARITY-1;
! 			else if (*cp <= SWITCH)
  				*cp = '?';
  		}
  
--- 144,150 ----
  		for (cp = path; *cp != NULL; cp++) {
  			if ((u_char)*cp >= PARITY)
  				*cp &= PARITY-1;
! 			if (*cp <= SWITCH)
  				*cp = '?';
  		}
  

Second, an alternative fix which I personally prefer due to its simple
and consistent treatment of both kinds of illegal characters.


*** locate.code.c.orig	Thu Dec 22 14:35:10 1994
--- locate.code.c	Sun Apr 23 16:49:08 1995
***************
*** 142,150 ****
  
  		/* Squelch characters that would botch the decoding. */
  		for (cp = path; *cp != NULL; cp++) {
! 			if ((u_char)*cp >= PARITY)
! 				*cp &= PARITY-1;
! 			else if (*cp <= SWITCH)
  				*cp = '?';
  		}
  
--- 142,148 ----
  
  		/* Squelch characters that would botch the decoding. */
  		for (cp = path; *cp != NULL; cp++) {
! 			if ((u_char)*cp >= PARITY || (u_char)*cp <= SWITCH)
  				*cp = '?';
  		}
  
>Audit-Trail:
>Unformatted: