NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/58619: nawk 2024-08-17 broken and incompatible for non-UTF-8 and non-C locales



The following reply was made to PR bin/58619; it has been noted by GNATS.

From: RVP <rvp%SDF.ORG@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/58619: nawk 2024-08-17 broken and incompatible for non-UTF-8
 and non-C locales
Date: Tue, 20 Aug 2024 10:36:34 +0000 (UTC)

 On Tue, 20 Aug 2024, rokuyama.rk%gmail.com@localhost wrote:
 
 > (BTW, their documentation is *REALLY* poor.)
 >
 
 Ya, the BSD extensions aren't documented in the `bsd-features' branch man-page.
 
 > Try euc.txt, which I converted to EUC-JP from
 > http://www.jp.netbsd.org/ja/JP/index.html
 >
 > ---
 > $ ftp https://www.netbsd.org/~rin/euc.txt
 > ...
 > $ env LC_CTYPE=ja_JP.eucJP \
 > awk 'BEGIN{sum = 0} {sum += length($0)} END{print sum}'
 > ---
 >
 > Older versions and 2024-08-17 give 10978 and 10418, respectively.
 >> Fix:
 > Just for example above:
 >
 > https://gist.github.com/rokuyama/c7e6d12b6a7bcad0704f706c4f7e9569
 >
 
 Well, I guess it's a pain prepending `LC_ALL=C' on all non-UTF-8 locales, so:
 
 ```
 diff -urN nawk.orig/dist/main.c nawk/dist/main.c
 --- nawk.orig/dist/main.c	2024-08-18 03:11:06.691688756 +0000
 +++ nawk/dist/main.c	2024-08-20 10:24:10.089804741 +0000
 @@ -32,6 +32,7 @@
   #include <stdio.h>
   #include <ctype.h>
   #include <locale.h>
 +#include <langinfo.h>
   #include <stdlib.h>
   #include <string.h>
   #include <signal.h>
 @@ -143,6 +144,8 @@
 
   	setlocale(LC_CTYPE, "");
   	setlocale(LC_NUMERIC, "C"); /* for parsing cmdline & prog */
 +	if (strcmp(nl_langinfo(CODESET), "UTF-8"))
 +		setlocale(LC_ALL, "C");	/* not UTF-8, force "C" */
   	awk_mb_cur_max = MB_CUR_MAX;
   	cmdname = argv[0];
   	if (argc == 1) {
 ```
 
 -RVP
 


Home | Main Index | Thread Index | Old Index