On Wed, Feb 05, 2025 at 05:05:33PM -0500, Daniel Macks wrote:
> For the first one, test-suite.log has:
> 
> FAIL: test_scripts/encoded_non_ascii_command_line.sh
> ====================================================
> 
> D: encoded/diffs/non_ascii_command_line.diff (printed below)
> Only in ./encoded/res_parser/non_ascii_command_line: int%c3%a9rnal.txt
> Only in encoded/out_parser/non_ascii_command_line: inte%cc%81rnal.txt
> Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9-texinfo.texi
> Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8.1
> Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8.2
> Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8_abt.html
> Only in encoded/out_parser/non_ascii_command_line: ose%cc%81-texinfo.texi
> Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8.1
> Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8.2
> Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8_abt.html
> D: encoded/diffs/non_ascii_command_line.diff (printed above)
> testdir: encoded
> driving_file: ./encoded/list-of-tests
> made result dir: ./encoded/res_parser/
> 
> doing test non_ascii_command_line, src_file 
> built_input/non_ascii/osé_utf8.texi
> format_option: 
> texi2any.pl non_ascii_command_line -> 
> encoded/out_parser/non_ascii_command_line
>  /usr/bin/perl -w ./..//texi2any.pl  --force --conf-dir ./../t/init/ 
> --conf-dir ./../init --conf-dir ./../ext -I ./encoded -I encoded/ -I ./ -I . 
> -I built_input -I built_input/non_ascii --error-limit=1000 -c TEST=1  
> --output encoded/out_parser/non_ascii_command_line/ --html --no-split -c 
> DO_ABOUT=1 -c COMMAND_LINE_ENCODING=UTF-8 -c MESSAGE_ENCODING=UTF-8 -c 
> OUTPUT_FILE_NAME_ENCODING=UTF-8 --split=Mekanïk 
> --document-language=Destruktïw -c 'Kommandöh vâl' -D TÛT -D 'vùr ké' -U ôndef 
> -c 'FORMAT_MENU mînù' 
> --macro-expand=encoded/out_parser/non_ascii_command_line/osé-texinfo.texi 
> --internal-links=encoded/out_parser/non_ascii_command_line/intérnal.txt 
> --css-include çss.css --css-include cêss.css --css-ref=rëf --css-ref=öref -D 
> 'neednonasciifilenames Need non-ASCII file names' 
> built_input/non_ascii/osé_utf8.texi > 
> encoded/out_parser/non_ascii_command_line/osé_utf8.1 
> 2>encoded/out_parser/non_ascii_command_line/osé_utf8.2
> 
> all done, exiting with status 1
> 
> and nearly identical information about the others. In all cases, the 
> filenames differ by %a9 vs %81.
> 

It is not just that one byte.  The reference results have %c3%a9 and what
was produced is e%cc%81.  This is different ways of outputting the é character
(e with acute accent).  You can check this on the command line with

  LC_ALL=C printf 'e\xCC\x81'

and

  LC_ALL=C printf '\xC3\xA9'

Perhaps some Unicode normalisation step is missing and/or faulty.

Reply via email to