Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 05:27:51PM +, Gavin Smith wrote: > If the error message became something like > > "nœud « �sseul� » non référencé" > > then encoding this to UTF-8 would break the parts which already were in > UTF-8. I just commited input decoding (command line, environment, translate

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 06:39:47PM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 07:00:23PM +0100, Patrice Dumas wrote: > > I don't get it. If everything is decoded to internal perl encoding and > > then the file name is encoded to the local, here Latin-1, everything > > will be ok, as long

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 07:00:23PM +0100, Patrice Dumas wrote: > I don't get it. If everything is decoded to internal perl encoding and > then the file name is encoded to the local, here Latin-1, everything > will be ok, as long as the characters in manuals can be output in > Latin-1. The point i

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 05:27:51PM +, Gavin Smith wrote: > > For example, in the following the file name is output correctly as it is > > not decoded, but the string from the Texinfo file is decoded but not > > encoded and hence ends up incorrect in the message. Decoding everything > > and the

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 03:32:30PM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 2:55 PM Patrice Dumas wrote: > > > The byte sequences are just concatenated and used as the path to the file, > > > even if it's not validly encoded. This shouldn't cause a problem. > > > > It will cause a prob

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 17:27:51 + > > > $ ./texi2any.pl testé.texi > > testé.texi:8: warning: node `�sseul�' unreferenced > > Suppose the translation for the word "node" was non-ASCII. I'd expect > the translation for that word to be encoded correctly in the output, ev

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 04:13:43PM +0100, Patrice Dumas wrote: > Another reason for decoding and encoding everything is error messages. > I am actually a bit surprised that nobody ever complained that error > messages are not encoded. I am not sure what the best approach is. Say there is code lik

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 15:38:17 + > Cc: pertu...@free.fr, trash.parad...@protonmail.com, bug-texinfo@gnu.org > > On Sun, Feb 20, 2022 at 05:07:32PM +0200, Eli Zaretskii wrote: > > > The locale codeset could very easily be incorrect. Suppose somebody sets > > > a Latin-1

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 04:13:43PM +0100, Patrice Dumas wrote: > On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote: > > > > Decoding command-line arguments might be a good idea for everything > > EXCEPT file names. Why go to the bother of decoding file names when > > they have to be enc

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 2:55 PM Patrice Dumas wrote: > > The byte sequences are just concatenated and used as the path to the file, > > even if it's not validly encoded. This shouldn't cause a problem. > > It will cause a problem if the include file name itself is not ASCII. > To avoid any proble

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 05:07:32PM +0200, Eli Zaretskii wrote: > > The locale codeset could very easily be incorrect. Suppose somebody sets > > a Latin-1 locale, should they then be unable to build Texinfo manuals > > with non-ASCII UTF-8 filenames? > > They will see garbled file names. > > You

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote: > > Decoding command-line arguments might be a good idea for everything > EXCEPT file names. Why go to the bother of decoding file names when > they have to be encoded again to use them? Another reason for decoding and encoding everyth

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 14:53:46 + > > On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote: > > On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote: > > > > > > I made a similar change in commit e11835b62d. > > > > I propose another plan: > > * decode

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 14:48:39 + > Cc: Patrice Dumas , trash.parad...@protonmail.com, > bug-texinfo@gnu.org > > > what if the argument of -I is in some non-UTF-8 encoding, and the > > source uses @include with a non-ASCII file name encoded according > > top @docum

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 14:44:13 + > Cc: pertu...@free.fr, trash.parad...@protonmail.com, bug-texinfo@gnu.org > > On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote: > > > This means that any non-ASCII characters in a filename in a Texinfo source > > > file are

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote: > > On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote: > > > > > > I made a similar change in commit e11835b62d. > > > > I propose another plan: > > * decode arg

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 02:48:39PM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote: > > > > If you want the Texinfo sources to be in UTF-8 internally, it might be > > impossible not to decode the command-line arguments into UTF-8. Only > > if the command-

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote: > On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote: > > > > I made a similar change in commit e11835b62d. > > I propose another plan: > * decode arguments on the command line arguments based on the > locale > * if on wind

Re: Non-ASCII characters in @include search path

2022-02-20 Thread pertusus
On Sun, Feb 20, 2022 at 02:44:13PM +, Gavin Smith wrote: > > > The only thorough solution, IMO, is to assume the file names are > > encoded in the filesystem as specified by the locale's codeset. That, > > too, can be false, but at least in the absolute majority of use cases > > it will be tr

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote: > > Date: Sun, 20 Feb 2022 14:28:23 +0100 > > From: Patrice Dumas > > > > On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote: > > > > > > My thought was that the argument to -I could have been any sequence of > > > bytes, >

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote: > > This means that any non-ASCII characters in a filename in a Texinfo source > > file are sought in the filesystem as the corresponding UTF-8 sequences. > > This will not work on Windows. I can see that there could be an issue if fi

Re: Non-ASCII characters in @include search path

2022-02-20 Thread pertusus
On Sun, Feb 20, 2022 at 03:45:37PM +0200, Eli Zaretskii wrote: > > > > I do not think that it is a good solution either. On Linux, unless I > > missed something, the file name encoding should be utf-8 irrespective of > > the locale, or the Texinfo document encoding. > > No, that's incorrect. Li

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> Date: Sun, 20 Feb 2022 14:32:01 +0100 > From: pertu...@free.fr > Cc: Gavin Smith , trash.parad...@protonmail.com, > bug-texinfo@gnu.org > > On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote: > > > > The only thorough solution, IMO, is to assume the file names are > > encoded i

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote: > > Date: Sun, 20 Feb 2022 14:28:23 +0100 > > From: Patrice Dumas > > > > On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote: > > > > > > My thought was that the argument to -I could have been any sequence of > > > bytes, >

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote: > > I made a similar change in commit e11835b62d. I propose another plan: * decode arguments on the command line arguments based on the locale * if on windows use the locale to determine the encoding for the file names, otherwise us

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> Date: Sun, 20 Feb 2022 14:28:23 +0100 > From: Patrice Dumas > > On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote: > > > > My thought was that the argument to -I could have been any sequence of > > bytes, > > not necessarily correct UTF-8. It would be wrong then to attempt any > >

Re: Non-ASCII characters in @include search path

2022-02-20 Thread pertusus
On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote: > > The only thorough solution, IMO, is to assume the file names are > encoded in the filesystem as specified by the locale's codeset. That, > too, can be false, but at least in the absolute majority of use cases > it will be true. T

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 01:45:19PM +0100, Patrice Dumas wrote: > > On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote: > > > > > > I think that the correct way to do that is to use > > > Encode::encode($text, 'utf-8'); > > >

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 01:45:19PM +0100, Patrice Dumas wrote: > On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote: > > > > I think that the correct way to do that is to use > > Encode::encode($text, 'utf-8'); > > Also I think that it should be done as late as possible, so it would be

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> Date: Sun, 20 Feb 2022 13:02:23 +0100 > From: Patrice Dumas > > I just read some code in the XS parser, and it seems that the XS parser > works with utf8 encoded byte strings. Therefore the stat within the > XS parser will work if the file names are actually encoded using utf8 in > the file sy

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 20 Feb 2022 11:54:08 + > > Strings coming from the Texinfo source file have to be assumed to represent > characters, not bytes, as the Texinfo source is read with a certain encoding. > File names, however, are a sequence of bytes (on GNU/Linux at least; on > M

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote: > > I think that the correct way to do that is to use > Encode::encode($text, 'utf-8'); > Also I think that it should be done as late as possible, so it would be > better on $possible_file. It is Encode::encode('utf-8', $text), but in

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gaël Bonithon
Looks like I always use the pure Perl module, but for different reasons depending on the Texinfo version: 6.8 (https://github.com/archlinux/svntogit-packages/blob/packages/texinfo/trunk/PKGBUILD): $ makeinfo -I ./è simplest.texi checking /usr/lib/texinfo/MiscXS.la checking /usr/share/texinfo/lib

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 11:54:08AM +, Gavin Smith wrote: > I propose the following fix, which doesn't touch Perl's internal string > representation directly: > > diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm > index 29dbf3c8c3..7babba016c 100644 > --- a/tp/Texinfo/Common.pm > +++ b/

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sat, Feb 19, 2022 at 10:38:23PM +, Gaël Bonithon wrote: > It doesn't seem to depend on the version of Texinfo for me. > I had tried downgrading to versions 6.6 and 6.7 during the Octave problem > cited in my first post, and I tried earlier by building from the latest > commit 091f22068c fo

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 11:54:08AM +, Gavin Smith wrote: > I found it was the last argument to File::Spec->catdir that led to the > utf8 flag being on: $filename. This came from the argument to > locate_include_file, which came from the Texinfo source file. The following > also fixes it: I d

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 10:11:09AM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote: > > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote: > > > I think that there is some wrong encoding/decoding somewhere, > > > but I don't know where. It is par

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 10:11:09AM +, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote: > > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote: > > > I think that there is some wrong encoding/decoding somewhere, > > > but I don't know where. It is par

Re: more consistent ignoring before node and sections and Top node

2022-02-20 Thread Gavin Smith
On Sat, Feb 19, 2022 at 11:28:43PM +0100, Patrice Dumas wrote: > Hello, > > Right now there is a diversity of handling of text at the beginning of > Texinfo manuals, before the first @node and sectioning, but also for the > informations in @titlepage that are not truly title page (@insertcopying >

Re: more consistent ignoring before node and sections and Top node

2022-02-20 Thread Patrice Dumas
On Sun, Feb 20, 2022 at 08:25:59AM +0200, Eli Zaretskii wrote: > > Date: Sat, 19 Feb 2022 23:28:43 +0100 > > From: Patrice Dumas > > > > Info: always ignore text before the first @node or sectioning command. > > There are directives there that cannot be ignored, so I'm not sure I > understand wh

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote: > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote: > > I think that there is some wrong encoding/decoding somewhere, > > but I don't know where. It is particularly strange that I cannot > > reproduce with 6.8 but Gaël can.

Re: Non-ASCII characters in @include search path

2022-02-20 Thread Gavin Smith
On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote: > I think that there is some wrong encoding/decoding somewhere, > but I don't know where. It is particularly strange that I cannot > reproduce with 6.8 but Gaël can. I reproduced with 6.8 but only with TEXINFO_XS=omit. I am going to