i18n backlog

Bruno Haible Sat, 09 Sep 2023 07:59:11 -0700

Paul Eggert wrote in
<https://lists.gnu.org/archive/html/bug-gnulib/2023-09/msg00055.html>:
> It's been a month and I couldn't think of anything better


I hardly couldn't work on the gnulib i18n backlog because I've been
jumping in on other tasks the last month (readutmp and boot-time
improvements, coreutils testing, countering attempts to use -Wextra, ...).
And I won't be working on it in the next week, since I want to prepare
a gettext bug-fix release now.

My current gnulib i18n backlog is as follows; maybe someone can jump in
on tasks that I have not yet started.

* Commit 5f27affb42337dc605a9a59f1c6a99516cd9747a has replaced a use of
  mbiterf.h with mbuiterf.h. I'm not convinced this provides a speed
  improvement, since the comments in mbuiterf.h say:
    The mbuif_* macros are therefore suitable when there is a high probability
    that only the first few multibyte characters need to be inspected.
    Whereas the mbif_* macros are better if usually the iteration runs
    through the entire string.
  To validate or invalidate my hypothesis, someone would need to create
  a benchmark test-bench-trim.c, using the same bench.h infrastructure
  that we already have.

* Integrate Paul's improved handling of error byte sequences (MEE).
  Based on the patch in
  https://lists.gnu.org/archive/html/bug-gnulib/2023-08/msg00047.html
  but as an 'extern' function, not inline, since it's about an
  exceptional case. Keeping exceptional case handling out-of-line
  will help the compiler's code generation.
  Also update all modules' test suite accordingly.

* Commit b93de66735cd6f935ee0970f8cb26908d113e09d introduced mcel.h, but
  it has tabs. Can we untabify
    mcel.h
    mountlist.c
    verify.h
  (as we do with all source files that are not shared with glibc)?

* Commit b93de66735cd6f935ee0970f8cb26908d113e09d introduced mcel.h.
  Summarize, in comments, the discussion we had regarding SEE and MEE.
  Basically, MEE is good in all circumstances, whereas SEE is only
  good if the surrounding applications does only specific things with
  the strings.
  Also needs to mention for which encodings it makes a difference,
  cf. https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00131.html

* In https://lists.gnu.org/archive/html/bug-gnulib/2023-09/msg00055.html
  Paul argues that "These patches shouldn't affect behavior"
  but I had already explained why mbscasecmp with SEE likely has different
  behaviour than with MEE,
  in https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00131.html .
  Currently the mbscasecmp tests test only valid input. Someone should
  extend the unit test to cover strings with invalid input bytes. Then
  we could see what difference exactly it makes.

* Dependencies with "or" instead of "and", requested by Paul:
  https://lists.gnu.org/archive/html/bug-gnulib/2023-09/msg00055.html
  We have a few examples of it so far, but no straightforwardly-
  applicable technique. Need to think about it.

* Enhance the unit tests of the 'regex' module. (Already started by me.)

* Migrate the 'regex' module to use mbrtoc32-regular instead of mbrtowc.
  (I have a patch. But it needs to wait until the units are extended first.)

If you want to work on any of this, please start a new thread with
appropriate Subject line. Thanks!

Bruno

i18n backlog

Reply via email to