On Sun, Feb 20, 2022 at 05:27:51PM +, Gavin Smith wrote:
> If the error message became something like
>
> "nœud « �sseul� » non référencé"
>
> then encoding this to UTF-8 would break the parts which already were in
> UTF-8.
I just commited input decoding (command line, environment, translate
On Sun, Feb 20, 2022 at 06:39:47PM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 07:00:23PM +0100, Patrice Dumas wrote:
> > I don't get it. If everything is decoded to internal perl encoding and
> > then the file name is encoded to the local, here Latin-1, everything
> > will be ok, as long
On Sun, Feb 20, 2022 at 07:00:23PM +0100, Patrice Dumas wrote:
> I don't get it. If everything is decoded to internal perl encoding and
> then the file name is encoded to the local, here Latin-1, everything
> will be ok, as long as the characters in manuals can be output in
> Latin-1.
The point i
On Sun, Feb 20, 2022 at 05:27:51PM +, Gavin Smith wrote:
> > For example, in the following the file name is output correctly as it is
> > not decoded, but the string from the Texinfo file is decoded but not
> > encoded and hence ends up incorrect in the message. Decoding everything
> > and the
On Sun, Feb 20, 2022 at 03:32:30PM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 2:55 PM Patrice Dumas wrote:
> > > The byte sequences are just concatenated and used as the path to the file,
> > > even if it's not validly encoded. This shouldn't cause a problem.
> >
> > It will cause a prob
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 17:27:51 +
>
> > $ ./texi2any.pl testé.texi
> > testé.texi:8: warning: node `�sseul�' unreferenced
>
> Suppose the translation for the word "node" was non-ASCII. I'd expect
> the translation for that word to be encoded correctly in the output, ev
On Sun, Feb 20, 2022 at 04:13:43PM +0100, Patrice Dumas wrote:
> Another reason for decoding and encoding everything is error messages.
> I am actually a bit surprised that nobody ever complained that error
> messages are not encoded.
I am not sure what the best approach is. Say there is code lik
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 15:38:17 +
> Cc: pertu...@free.fr, trash.parad...@protonmail.com, bug-texinfo@gnu.org
>
> On Sun, Feb 20, 2022 at 05:07:32PM +0200, Eli Zaretskii wrote:
> > > The locale codeset could very easily be incorrect. Suppose somebody sets
> > > a Latin-1
On Sun, Feb 20, 2022 at 04:13:43PM +0100, Patrice Dumas wrote:
> On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote:
> >
> > Decoding command-line arguments might be a good idea for everything
> > EXCEPT file names. Why go to the bother of decoding file names when
> > they have to be enc
On Sun, Feb 20, 2022 at 2:55 PM Patrice Dumas wrote:
> > The byte sequences are just concatenated and used as the path to the file,
> > even if it's not validly encoded. This shouldn't cause a problem.
>
> It will cause a problem if the include file name itself is not ASCII.
> To avoid any proble
On Sun, Feb 20, 2022 at 05:07:32PM +0200, Eli Zaretskii wrote:
> > The locale codeset could very easily be incorrect. Suppose somebody sets
> > a Latin-1 locale, should they then be unable to build Texinfo manuals
> > with non-ASCII UTF-8 filenames?
>
> They will see garbled file names.
>
> You
On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote:
>
> Decoding command-line arguments might be a good idea for everything
> EXCEPT file names. Why go to the bother of decoding file names when
> they have to be encoded again to use them?
Another reason for decoding and encoding everyth
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 14:53:46 +
>
> On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote:
> > On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote:
> > >
> > > I made a similar change in commit e11835b62d.
> >
> > I propose another plan:
> > * decode
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 14:48:39 +
> Cc: Patrice Dumas , trash.parad...@protonmail.com,
> bug-texinfo@gnu.org
>
> > what if the argument of -I is in some non-UTF-8 encoding, and the
> > source uses @include with a non-ASCII file name encoded according
> > top @docum
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 14:44:13 +
> Cc: pertu...@free.fr, trash.parad...@protonmail.com, bug-texinfo@gnu.org
>
> On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> > > This means that any non-ASCII characters in a filename in a Texinfo source
> > > file are
On Sun, Feb 20, 2022 at 02:53:46PM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote:
> > On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote:
> > >
> > > I made a similar change in commit e11835b62d.
> >
> > I propose another plan:
> > * decode arg
On Sun, Feb 20, 2022 at 02:48:39PM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote:
> >
> > If you want the Texinfo sources to be in UTF-8 internally, it might be
> > impossible not to decode the command-line arguments into UTF-8. Only
> > if the command-
On Sun, Feb 20, 2022 at 02:42:04PM +0100, Patrice Dumas wrote:
> On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote:
> >
> > I made a similar change in commit e11835b62d.
>
> I propose another plan:
> * decode arguments on the command line arguments based on the
> locale
> * if on wind
On Sun, Feb 20, 2022 at 02:44:13PM +, Gavin Smith wrote:
>
> > The only thorough solution, IMO, is to assume the file names are
> > encoded in the filesystem as specified by the locale's codeset. That,
> > too, can be false, but at least in the absolute majority of use cases
> > it will be tr
On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 20 Feb 2022 14:28:23 +0100
> > From: Patrice Dumas
> >
> > On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote:
> > >
> > > My thought was that the argument to -I could have been any sequence of
> > > bytes,
>
On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> > This means that any non-ASCII characters in a filename in a Texinfo source
> > file are sought in the filesystem as the corresponding UTF-8 sequences.
>
> This will not work on Windows.
I can see that there could be an issue if fi
On Sun, Feb 20, 2022 at 03:45:37PM +0200, Eli Zaretskii wrote:
> >
> > I do not think that it is a good solution either. On Linux, unless I
> > missed something, the file name encoding should be utf-8 irrespective of
> > the locale, or the Texinfo document encoding.
>
> No, that's incorrect. Li
> Date: Sun, 20 Feb 2022 14:32:01 +0100
> From: pertu...@free.fr
> Cc: Gavin Smith , trash.parad...@protonmail.com,
> bug-texinfo@gnu.org
>
> On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> >
> > The only thorough solution, IMO, is to assume the file names are
> > encoded i
On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 20 Feb 2022 14:28:23 +0100
> > From: Patrice Dumas
> >
> > On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote:
> > >
> > > My thought was that the argument to -I could have been any sequence of
> > > bytes,
>
On Sun, Feb 20, 2022 at 12:45:59PM +, Gavin Smith wrote:
>
> I made a similar change in commit e11835b62d.
I propose another plan:
* decode arguments on the command line arguments based on the
locale
* if on windows use the locale to determine the encoding
for the file names, otherwise us
> Date: Sun, 20 Feb 2022 14:28:23 +0100
> From: Patrice Dumas
>
> On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote:
> >
> > My thought was that the argument to -I could have been any sequence of
> > bytes,
> > not necessarily correct UTF-8. It would be wrong then to attempt any
> >
On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
>
> The only thorough solution, IMO, is to assume the file names are
> encoded in the filesystem as specified by the locale's codeset. That,
> too, can be false, but at least in the absolute majority of use cases
> it will be true. T
On Sun, Feb 20, 2022 at 01:09:06PM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 01:45:19PM +0100, Patrice Dumas wrote:
> > On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote:
> > >
> > > I think that the correct way to do that is to use
> > > Encode::encode($text, 'utf-8');
> > >
On Sun, Feb 20, 2022 at 01:45:19PM +0100, Patrice Dumas wrote:
> On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote:
> >
> > I think that the correct way to do that is to use
> > Encode::encode($text, 'utf-8');
> > Also I think that it should be done as late as possible, so it would be
> Date: Sun, 20 Feb 2022 13:02:23 +0100
> From: Patrice Dumas
>
> I just read some code in the XS parser, and it seems that the XS parser
> works with utf8 encoded byte strings. Therefore the stat within the
> XS parser will work if the file names are actually encoded using utf8 in
> the file sy
> From: Gavin Smith
> Date: Sun, 20 Feb 2022 11:54:08 +
>
> Strings coming from the Texinfo source file have to be assumed to represent
> characters, not bytes, as the Texinfo source is read with a certain encoding.
> File names, however, are a sequence of bytes (on GNU/Linux at least; on
> M
On Sun, Feb 20, 2022 at 01:10:16PM +0100, Patrice Dumas wrote:
>
> I think that the correct way to do that is to use
> Encode::encode($text, 'utf-8');
> Also I think that it should be done as late as possible, so it would be
> better on $possible_file.
It is Encode::encode('utf-8', $text), but in
Looks like I always use the pure Perl module, but for different reasons
depending on the Texinfo version:
6.8
(https://github.com/archlinux/svntogit-packages/blob/packages/texinfo/trunk/PKGBUILD):
$ makeinfo -I ./è simplest.texi
checking /usr/lib/texinfo/MiscXS.la
checking /usr/share/texinfo/lib
On Sun, Feb 20, 2022 at 11:54:08AM +, Gavin Smith wrote:
> I propose the following fix, which doesn't touch Perl's internal string
> representation directly:
>
> diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
> index 29dbf3c8c3..7babba016c 100644
> --- a/tp/Texinfo/Common.pm
> +++ b/
On Sat, Feb 19, 2022 at 10:38:23PM +, Gaël Bonithon wrote:
> It doesn't seem to depend on the version of Texinfo for me.
> I had tried downgrading to versions 6.6 and 6.7 during the Octave problem
> cited in my first post, and I tried earlier by building from the latest
> commit 091f22068c fo
On Sun, Feb 20, 2022 at 11:54:08AM +, Gavin Smith wrote:
> I found it was the last argument to File::Spec->catdir that led to the
> utf8 flag being on: $filename. This came from the argument to
> locate_include_file, which came from the Texinfo source file. The following
> also fixes it:
I d
On Sun, Feb 20, 2022 at 10:11:09AM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote:
> > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote:
> > > I think that there is some wrong encoding/decoding somewhere,
> > > but I don't know where. It is par
On Sun, Feb 20, 2022 at 10:11:09AM +, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote:
> > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote:
> > > I think that there is some wrong encoding/decoding somewhere,
> > > but I don't know where. It is par
On Sat, Feb 19, 2022 at 11:28:43PM +0100, Patrice Dumas wrote:
> Hello,
>
> Right now there is a diversity of handling of text at the beginning of
> Texinfo manuals, before the first @node and sectioning, but also for the
> informations in @titlepage that are not truly title page (@insertcopying
>
On Sun, Feb 20, 2022 at 08:25:59AM +0200, Eli Zaretskii wrote:
> > Date: Sat, 19 Feb 2022 23:28:43 +0100
> > From: Patrice Dumas
> >
> > Info: always ignore text before the first @node or sectioning command.
>
> There are directives there that cannot be ignored, so I'm not sure I
> understand wh
On Sun, Feb 20, 2022 at 09:11:54AM +, Gavin Smith wrote:
> On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote:
> > I think that there is some wrong encoding/decoding somewhere,
> > but I don't know where. It is particularly strange that I cannot
> > reproduce with 6.8 but Gaël can.
On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote:
> I think that there is some wrong encoding/decoding somewhere,
> but I don't know where. It is particularly strange that I cannot
> reproduce with 6.8 but Gaël can.
I reproduced with 6.8 but only with TEXINFO_XS=omit. I am going to
42 matches
Mail list logo