tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Minor updates to sort ?



Inspired by Paul Goyette's question (on netbsd-users) I took a look at sort,
and I'd like to commit the following updates if no-one objects.

The only changes that should affect anything are the addition of the posix
C option, which is identical to c, but doesn't write messages to stderr
if the input file is not sorted, and fixing bugs in the processing of -R
such that if -R is used (and without setting it to \n - the processing of
which is also fixed in case it is set that wat using -R 10) \n does not
become a field separator regardless of what might be set later with -t (if
-t preceded -R it would have worked correctly, but not the other way.)

Aside from that the changes are more or less cosmetic - they enforce using
only one of -c -C and -m (which make no sense used together), make the
usage() reflect reality (including formatting it to stop assuming it
is outputting to an 80 column display..., and reflect the man page changes
mentioned next), and fix a minor bug in a comment, removed the unused 'x'
option (what was that?) from SORT_OPTS (no effect, generates usage() either
way) and sorted the option processing (R comes before S...)

In the man page, -C is documented, the synopsis is split to show the
(only one file allowed) different usage for -C or -c, and perhaps most
importantly, the names "field1" and "filed2" are changed to "kstart" and
"kend" to make it (a little more) clear that the -k argument does not specify
or use a field as such, but designates the start and end of the sort keys
(with the designators using fields as an addressing object - which is all
fields are used for in sort, unlike awk, cut, etc.) and -R is fully documented.

There are no changes (at all) to anything actually related to sorting...

Anyone object to these changes?   (patch appended)

kre

Index: msort.c
===================================================================
RCS file: /cvsroot/src/usr.bin/sort/msort.c,v
retrieving revision 1.30
diff -u -r1.30 msort.c
--- msort.c	5 Feb 2010 21:58:42 -0000	1.30
+++ msort.c	29 May 2016 05:00:34 -0000
@@ -365,7 +365,7 @@
  * check order on one file
  */
 void
-order(struct filelist *filelist, struct field *ftbl)
+order(struct filelist *filelist, struct field *ftbl, int quiet)
 {
 	get_func_t get = SINGL_FLD ? makeline : makekey;
 	RECHEADER *crec, *prec, *trec;
@@ -387,10 +387,14 @@
 		exit(0);
 	while (get(fp, crec, crec_end, ftbl) == 0) {
 		if (0 < (c = cmp(prec, crec))) {
+			if (quiet)
+				exit(1);
 			crec->data[crec->length-1] = 0;
 			errx(1, "found disorder: %s", crec->data+crec->offset);
 		}
 		if (UNIQUE && !c) {
+			if (quiet)
+				exit(1);
 			crec->data[crec->length-1] = 0;
 			errx(1, "found non-uniqueness: %s",
 			    crec->data+crec->offset);
Index: sort.1
===================================================================
RCS file: /cvsroot/src/usr.bin/sort/sort.1,v
retrieving revision 1.34
diff -u -r1.34 sort.1
--- sort.1	29 May 2013 15:00:35 -0000	1.34
+++ sort.1	29 May 2016 05:00:34 -0000
@@ -67,16 +67,26 @@
 .Nd sort or merge text files
 .Sh SYNOPSIS
 .Nm
-.Op Fl bcdfHilmnrSsu
+.Op Fl bdfHilmnrSsu
 .Oo
 .Fl k
-.Ar field1 Ns Op Li \&, Ns Ar field2
+.Ar kstart Ns Op Li \&, Ns Ar kend
 .Oc
 .Op Fl o Ar output
 .Op Fl R Ar char
 .Op Fl T Ar dir
 .Op Fl t Ar char
 .Op Ar
+.Nm
+.Fl c|C
+.Op Fl bdfilnru
+.Oo
+.Fl k
+.Ar kstart Ns Op Li \&, Ns Ar kend
+.Op Fl t Ar char
+.Oc
+.Op Fl R Ar char
+.Op Ar file
 .Sh DESCRIPTION
 The
 .Nm
@@ -101,6 +111,10 @@
 produces no output.
 See also
 .Fl u .
+.It Fl C
+Identical to
+.Fl c
+without the error messages in the case of unsorted input.
 .It Fl H
 Ignored for compatibility with earlier versions of
 .Nm .
@@ -137,9 +151,13 @@
 option, check that there are no lines with duplicate keys.
 .El
 .Pp
-The following options override the default ordering rules.
-When ordering options appear independent of key field
-specifications, the requested field ordering rules are
+The following options,
+which should be given before any
+.Fl k
+options, override the default ordering rules.
+When ordering options appear independent of,
+and before, key field specifications,
+the requested field ordering rules are
 applied globally to all sort keys.
 When attached to a specific key (see
 .Fl k ) ,
@@ -224,12 +242,21 @@
 This should be used with discretion;
 .Fl R Aq Ar alphanumeric
 usually produces undesirable results.
+If char is not a single character, then it
+specifies the value of the desired record
+separator as an integer specified in any
+of the normal NNN, 0ooo, or 0xXXX ways,
+or as an octal value preceded by \e.
+Caution: do not attempt to specify Ctl-A
+as
+.Dq -R 1
+which will not do what was intended at all!
 The default record separator is newline.
-.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2
+.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend
 Designates the starting position,
-.Ar field1 ,
+.Ar kstart ,
 and optional ending position,
-.Ar field2 ,
+.Ar kend ,
 of a key field.
 The
 .Fl k
@@ -265,16 +292,16 @@
 Fields are specified
 by the
 .Fl k
-.Ar field1 Ns Op \&, Ns Ar field2
+.Ar kstart Ns Op \&, Ns Ar kend
 argument.
 A missing
-.Ar field2
+.Ar kend
 argument defaults to the end of a line.
 .Pp
 The arguments
-.Ar field1
+.Ar kstart
 and
-.Ar field2
+.Ar kend
 have the form
 .Ar m Ns Li \&. Ns Ar n
 and can be followed by one or more of the letters
@@ -284,7 +311,7 @@
 .Cm r ,
 which correspond to the options discussed above.
 A
-.Ar field1
+.Ar kstart
 position specified by
 .Ar m Ns Li \&. Ns Ar n
 .Pq Ar m , n No \*[Gt] 0
@@ -296,7 +323,7 @@
 A missing
 .Li \&. Ns Ar n
 in
-.Ar field1
+.Ar kstart
 means
 .Ql \&.1 ,
 indicating the first character of the
@@ -314,7 +341,7 @@
 field.
 .Pp
 A
-.Ar field2
+.Ar kend
 position specified by
 .Ar m Ns Li \&. Ns Ar n
 is interpreted as
@@ -451,7 +478,7 @@
 Thus performance depends highly on efficient choice of sort keys, and the
 .Fl b
 option and the
-.Ar field2
+.Ar kend
 argument of the
 .Fl k
 option should be used whenever possible.
Index: sort.c
===================================================================
RCS file: /cvsroot/src/usr.bin/sort/sort.c,v
retrieving revision 1.61
diff -u -r1.61 sort.c
--- sort.c	16 Sep 2011 15:39:29 -0000	1.61
+++ sort.c	29 May 2016 05:00:34 -0000
@@ -117,7 +117,7 @@
 main(int argc, char *argv[])
 {
 	int ch, i, stdinflag = 0;
-	char cflag = 0, mflag = 0;
+	char mode = 0;
 	char *outfile, *outpath = 0;
 	struct field *fldtab;
 	size_t fldtab_sz, fld_cnt;
@@ -145,9 +145,9 @@
 	fldtab = emalloc(fldtab_sz * sizeof(*fldtab));
 	memset(fldtab, 0, fldtab_sz * sizeof(*fldtab));
 
-#define SORT_OPTS "bcdD:fHik:lmno:rR:sSt:T:ux"
+#define SORT_OPTS "bcCdD:fHik:lmno:rR:sSt:T:u"
 
-	/* Convert "+field" args to -f format */
+	/* Convert "+field" args to -k format */
 	fixit(&argc, argv, SORT_OPTS);
 
 	if (!(tmpdir = getenv("TMPDIR")))
@@ -158,8 +158,10 @@
 		case 'b':
 			fldtab[0].flags |= BI | BT;
 			break;
-		case 'c':
-			cflag = 1;
+		case 'c': case 'C': case 'm':
+			if (mode)
+				usage("Incompatible operation modes");
+			mode = ch;
 			break;
 		case 'D': /* Debug flags */
 			for (i = 0; optarg[i]; i++)
@@ -179,15 +181,33 @@
 
 			setfield(optarg, &fldtab[++fld_cnt], fldtab[0].flags);
 			break;
-		case 'm':
-			mflag = 1;
-			break;
 		case 'o':
 			outpath = optarg;
 			break;
 		case 'r':
 			REVERSE = 1;
 			break;
+		case 'R':
+			if (REC_D != '\n')
+				usage("multiple record delimiters");
+			REC_D = *optarg;
+			if (optarg[1] != '\0') {
+				char *ep;
+				int t = 0;
+
+				if (optarg[0] == '\\')
+					optarg++, t = 8;
+				REC_D = (int)strtol(optarg, &ep, t);
+				if (*ep != '\0' || REC_D < 0 ||
+				    REC_D >= (int)__arraycount(d_mask))
+					errx(2, "invalid record delimiter %s",
+					    optarg);
+			}
+			if (REC_D == '\n')
+				break;
+			d_mask['\n'] = d_mask[' '];
+			d_mask[REC_D] = REC_D_F;
+			break;
 		case 's':
 			/*
 			 * Nominally 'stable sort', keep lines with equal keys
@@ -213,30 +233,11 @@
 			SEP_FLAG = 1;
 			d_mask[' '] &= ~FLD_D;
 			d_mask['\t'] &= ~FLD_D;
+			d_mask['\n'] &= ~FLD_D;
 			d_mask[(u_char)*optarg] |= FLD_D;
 			if (d_mask[(u_char)*optarg] & REC_D_F)
 				errx(2, "record/field delimiter clash");
 			break;
-		case 'R':
-			if (REC_D != '\n')
-				usage("multiple record delimiters");
-			REC_D = *optarg;
-			if (REC_D == '\n')
-				break;
-			if (optarg[1] != '\0') {
-				char *ep;
-				int t = 0;
-				if (optarg[0] == '\\')
-					optarg++, t = 8;
-				REC_D = (int)strtol(optarg, &ep, t);
-				if (*ep != '\0' || REC_D < 0 ||
-				    REC_D >= (int)__arraycount(d_mask))
-					errx(2, "invalid record delimiter %s",
-					    optarg);
-			}
-			d_mask['\n'] = d_mask[' '];
-			d_mask[REC_D] = REC_D_F;
-			break;
 		case 'T':
 			/* -T tmpdir */
 			tmpdir = optarg;
@@ -254,13 +255,13 @@
 		/* Don't sort on raw record if keys match */
 		posix_sort = 0;
 
-	if (cflag && argc > optind+1)
+	if ((mode == 'c' || mode == 'C') && argc > optind+1)
 		errx(2, "too many input files for -c option");
 	if (argc - 2 > optind && !strcmp(argv[argc-2], "-o")) {
 		outpath = argv[argc-1];
 		argc -= 2;
 	}
-	if (mflag && argc - optind > (MAXFCT - (16+1))*16)
+	if (mode == 'm' && argc - optind > (MAXFCT - (16+1))*16)
 		errx(2, "too many input files for -m option");
 
 	for (i = optind; i < argc; i++) {
@@ -309,8 +310,8 @@
 		num_input_files = argc - optind;
 	}
 
-	if (cflag) {
-		order(&filelist, fldtab);
+	if (mode == 'c' || mode == 'C') {
+		order(&filelist, fldtab, mode == 'C');
 		/* NOT REACHED */
 	}
 
@@ -348,7 +349,7 @@
 			err(2, "output file %s", outfile);
 	}
 
-	if (mflag)
+	if (mode == 'm')
 		fmerge(&filelist, num_input_files, outfp, fldtab);
 	else
 		fsort(&filelist, num_input_files, outfp, fldtab);
@@ -393,13 +394,20 @@
 static void
 usage(const char *msg)
 {
+	const char *pn = getprogname();
+
 	if (msg != NULL)
 		(void)fprintf(stderr, "%s: %s\n", getprogname(), msg);
 	(void)fprintf(stderr,
-	    "usage: %s [-bcdfHilmnrSsu] [-k field1[,field2]] [-o output]"
-	    " [-R char] [-T dir]", getprogname());
+	    "usage: %s [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output]"
+	    " [-R char] [-T dir]\n", pn);
 	(void)fprintf(stderr,
 	    "             [-t char] [file ...]\n");
+	(void)fprintf(stderr,
+	    "   or: %s -[cC] [-bdfilnru] [-k kstart[,kend]] [-o output]"
+	    " [-R char]\n", pn);
+	(void)fprintf(stderr,
+	    "             [-t char] [file]\n");
 	exit(2);
 }
 
Index: sort.h
===================================================================
RCS file: /cvsroot/src/usr.bin/sort/sort.h,v
retrieving revision 1.35
diff -u -r1.35 sort.h
--- sort.h	5 Aug 2015 07:10:03 -0000	1.35
+++ sort.h	29 May 2016 05:00:34 -0000
@@ -191,7 +191,7 @@
 int	 makeline(FILE *, RECHEADER *, u_char *, struct field *);
 void	 makeline_copydown(RECHEADER *);
 int	 optval(int, int);
-__dead void	 order(struct filelist *, struct field *);
+__dead void	 order(struct filelist *, struct field *, int);
 void	 putline(const RECHEADER *, FILE *);
 void	 putrec(const RECHEADER *, FILE *);
 void	 putkeydump(const RECHEADER *, FILE *);




Home | Main Index | Thread Index | Old Index