On Wed, Feb 05, 2025 at 05:05:33PM -0500, Daniel Macks wrote: > For the first one, test-suite.log has: > > FAIL: test_scripts/encoded_non_ascii_command_line.sh > ==================================================== > > D: encoded/diffs/non_ascii_command_line.diff (printed below) > Only in ./encoded/res_parser/non_ascii_command_line: int%c3%a9rnal.txt > Only in encoded/out_parser/non_ascii_command_line: inte%cc%81rnal.txt > Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9-texinfo.texi > Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8.1 > Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8.2 > Only in ./encoded/res_parser/non_ascii_command_line: os%c3%a9_utf8_abt.html > Only in encoded/out_parser/non_ascii_command_line: ose%cc%81-texinfo.texi > Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8.1 > Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8.2 > Only in encoded/out_parser/non_ascii_command_line: ose%cc%81_utf8_abt.html > D: encoded/diffs/non_ascii_command_line.diff (printed above) > testdir: encoded > driving_file: ./encoded/list-of-tests > made result dir: ./encoded/res_parser/ > > doing test non_ascii_command_line, src_file > built_input/non_ascii/osé_utf8.texi > format_option: > texi2any.pl non_ascii_command_line -> > encoded/out_parser/non_ascii_command_line > /usr/bin/perl -w ./..//texi2any.pl --force --conf-dir ./../t/init/ > --conf-dir ./../init --conf-dir ./../ext -I ./encoded -I encoded/ -I ./ -I . > -I built_input -I built_input/non_ascii --error-limit=1000 -c TEST=1 > --output encoded/out_parser/non_ascii_command_line/ --html --no-split -c > DO_ABOUT=1 -c COMMAND_LINE_ENCODING=UTF-8 -c MESSAGE_ENCODING=UTF-8 -c > OUTPUT_FILE_NAME_ENCODING=UTF-8 --split=Mekanïk > --document-language=Destruktïw -c 'Kommandöh vâl' -D TÛT -D 'vùr ké' -U ôndef > -c 'FORMAT_MENU mînù' > --macro-expand=encoded/out_parser/non_ascii_command_line/osé-texinfo.texi > --internal-links=encoded/out_parser/non_ascii_command_line/intérnal.txt > --css-include çss.css --css-include cêss.css --css-ref=rëf --css-ref=öref -D > 'neednonasciifilenames Need non-ASCII file names' > built_input/non_ascii/osé_utf8.texi > > encoded/out_parser/non_ascii_command_line/osé_utf8.1 > 2>encoded/out_parser/non_ascii_command_line/osé_utf8.2 > > all done, exiting with status 1 > > and nearly identical information about the others. In all cases, the > filenames differ by %a9 vs %81. >
It is not just that one byte. The reference results have %c3%a9 and what was produced is e%cc%81. This is different ways of outputting the é character (e with acute accent). You can check this on the command line with LC_ALL=C printf 'e\xCC\x81' and LC_ALL=C printf '\xC3\xA9' Perhaps some Unicode normalisation step is missing and/or faulty.