On 12/26/19 1:16 AM, Bruno Haible wrote:
> Setting LC_ALL=C in the test setup greatly reduces the test coverage. As far 
> as
> I can see, so far, all the grep tests that work on plain ASCII inputs and
> patterns were tested in the locale the user happens to be in. But with your
> patch, these tests will only be tested in the C locale.

You're right that it reduces test coverage. However, the patch to grep that you
proposed wouldn't suffice, because the test scripts have several other uses of
printf with octal escapes outside of ASCII range, and they'd all have to be
changed. And some of these others contain code like this:

e_acute=$(printf '\303\251')
printf "$e_acute\n" > exp || framework_failure_

which cwould seem to require that the shell itself, not merely the printf
command, be in a locale that is compatible with the byte sequence in question.

Also, I worry that for platforms where printf is a builtin, "LC_ALL=C printf
'\202'" won't work as POSIX requires because historically setting environment
variables has been buggy for shell builtins.

I had forgotten that init.sh was copied from Gnulib into grep. So I guess that
I'm thinking we should install the attached patch into Gnulib. The basic idea is
that running these test scripts in random locales is likely more trouble than
it's worth.
>From 34a9da750f13e6a9f658e891e3ecd69463ecbdb0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 26 Dec 2019 01:31:33 -0800
Subject: [PATCH] tests: default to the C locale in tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This avoids problems with tests that use printf with octal
escapes, and makes tests more reproducible.  The downside is less
test coverage for non-C locales, but randomish testing of those
other locales can be more trouble than it’s worth anyway.
* tests/init.sh (setup_): Set LC_ALL=C.
---
 ChangeLog     | 7 +++++++
 tests/init.sh | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 70b46625f..64855c04d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,12 @@
 2019-12-26  Paul Eggert  <egg...@cs.ucla.edu>
 
+	tests: default to the C locale in tests
+	This avoids problems with tests that use printf with octal
+	escapes, and makes tests more reproducible.  The downside is less
+	test coverage for non-C locales, but randomish testing of those
+	other locales can be more trouble than it’s worth anyway.
+	* tests/init.sh (setup_): Set LC_ALL=C.
+
 	mbrtowc: port better to narrow-wchar_t platforms
 	* lib/mbrtowc.c (mbrtowc): On platforms like AIX 7.2, where
 	wchar_t is too narrow to represent all the Unicode characters,
diff --git a/tests/init.sh b/tests/init.sh
index 8ca5c9055..a0ee8a9be 100644
--- a/tests/init.sh
+++ b/tests/init.sh
@@ -378,6 +378,8 @@ testdir_prefix_ () { printf gt; }
 # Set up the environment for the test to run in.
 setup_ ()
 {
+  export LC_ALL=C
+
   if test "$VERBOSE" = yes; then
     # Test whether set -x may cause the selected shell to corrupt an
     # application's stderr.  Many do, including zsh-4.3.10 and the /bin/sh
-- 
2.17.1

Reply via email to