Re: url protection

2022-08-06 Thread Patrice Dumas
On Sat, Aug 06, 2022 at 03:20:15PM +0100, Gavin Smith wrote: > Characters should be protected if they are not part of the syntax of the URL > but they could be. > > Maybe more readable than the WHATWG documentation: > https://www.rfc-editor.org/rfc/rfc3986#page-12 > > This gives a list of reserve

Re: url protection

2022-08-06 Thread Gavin Smith
On Sat, Aug 06, 2022 at 03:28:52PM +0200, Patrice Dumas wrote: > Answering to myself, the protection of URL actually does not mean > protecting all the characters, as the : of the scheme, / as path > separator should be left as is, and parts already %-escaped should also > be left as is. After som

Re: url protection

2022-08-06 Thread Patrice Dumas
Hello, Answering to myself, the protection of URL actually does not mean protecting all the characters, as the : of the scheme, / as path separator should be left as is, and parts already %-escaped should also be left as is. After some thinking, maybe the best, in @url, @email and @image would be

Re: url protection

2022-08-05 Thread Patrice Dumas
On Fri, Aug 05, 2022 at 06:29:45PM +0100, Gavin Smith wrote: > > > > To me the question is not the locales of the browser, but the encoding > > of the HTML file. If the encoding is ISO latin 1 as in: > > > > > > Then it seems to me that the URI::Escape call should be on a ISO latin 1 > > encode

Re: url protection

2022-08-05 Thread Gavin Smith
On Fri, Aug 05, 2022 at 11:11:05AM -0700, Per Bothner wrote: > > I think that we should support setting the output encoding explictly to > > a Texinfo supported encoding for a long time, even it UTF-8 becomes the > > default output encoding for HTML. > > Why? Is this useful for anything? I don't

Re: url protection

2022-08-05 Thread Per Bothner
On 8/4/22 23:15, Eli Zaretskii wrote: So you mean the HTML file will have these file names encoded in UTF-8, while the file itself will be created using the locale's encoding? That seems to me to be the correct approach. (At least if the HTML contents is UTF-8 - which it should be.) --

Re: url protection

2022-08-05 Thread Per Bothner
On 8/3/22 15:15, Patrice Dumas wrote: In any case, it does not mean that using another encoding is fragile nor dangerous. There is a list of supported encodings in the Texinfo manual https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040documentencoding.html I think that we support

Re: url protection

2022-08-05 Thread Per Bothner
On 8/5/22 10:35, Gavin Smith wrote: Could we write or copy the code for escaping a URL as it should be very short and simple? This would avoid an extra module dependency. Here is C/C++ code written by me. It works in two passes - the first counts the number of bytes that need to be escaped. F

Re: url protection

2022-08-05 Thread Gavin Smith
On Wed, Aug 03, 2022 at 03:26:08PM +0200, Patrice Dumas wrote: > Hello, > > In general, hrefs generated by texi2any to Texinfo manuals, be it the > current manual or external manual, only contain ascii characters > acceptable in hrefs. However, for some other href, for file > names, or from @url

Re: url protection

2022-08-05 Thread Gavin Smith
On Thu, Aug 04, 2022 at 10:08:47PM +0200, Patrice Dumas wrote: > On Thu, Aug 04, 2022 at 08:30:01PM +0100, Gavin Smith wrote: > > On 8/3/22, Patrice Dumas wrote: > > > > > > But that was > > > not really myquestion, my question was more on whether we should use the > > > output encoding to encode

Re: url protection

2022-08-05 Thread Patrice Dumas
On Fri, Aug 05, 2022 at 09:15:06AM +0300, Eli Zaretskii wrote: > > From: Gavin Smith > > Date: Thu, 4 Aug 2022 20:34:01 +0100 > > > > On 8/4/22, Eli Zaretskii wrote: > > > > > > Isn't the main issue here about encoding _file_names_, and the > > > encoding of HTML is secondary? I mean file names

Re: url protection

2022-08-04 Thread Eli Zaretskii
> From: Gavin Smith > Date: Thu, 4 Aug 2022 20:34:01 +0100 > > On 8/4/22, Eli Zaretskii wrote: > > > > Isn't the main issue here about encoding _file_names_, and the > > encoding of HTML is secondary? I mean file names we produce from > > Texinfo sources, for files that are part of the output f

Re: url protection

2022-08-04 Thread Patrice Dumas
On Thu, Aug 04, 2022 at 08:30:01PM +0100, Gavin Smith wrote: > On 8/3/22, Patrice Dumas wrote: > > > > But that was > > not really myquestion, my question was more on whether we should use the > > output encoding to encode string before doing the URI::Escape call, or > > always use UTF-8, even if

Re: url protection

2022-08-04 Thread Gavin Smith
On 8/3/22, Patrice Dumas wrote: > On Wed, Aug 03, 2022 at 12:08:15PM -0700, Per Bothner wrote: >> On 8/3/22 06:26, Patrice Dumas wrote: >> > The standard does not seems to clear on the encoding to use for the % >> > encodings. URI::Escape has uri_escape() and uri_escape_utf8. My >> > feeling is

Re: url protection

2022-08-03 Thread Eli Zaretskii
> Date: Wed, 3 Aug 2022 14:36:58 -0700 > From: Per Bothner > > On 8/3/22 13:46, Patrice Dumas wrote: > > This is not what we do in general for html/xhtml. For epub we always > > emit utf8, as it is mandated by the standard, but for html/xhtml, we > > use, in the default case, the input encoding

Re: url protection

2022-08-03 Thread Patrice Dumas
On Wed, Aug 03, 2022 at 02:36:58PM -0700, Per Bothner wrote: > On 8/3/22 13:46, Patrice Dumas wrote: > > This is not what we do in general for html/xhtml. For epub we always > > emit utf8, as it is mandated by the standard, but for html/xhtml, we > > use, in the default case, the input encoding fo

Re: url protection

2022-08-03 Thread Per Bothner
On 8/3/22 13:46, Patrice Dumas wrote: This is not what we do in general for html/xhtml. For epub we always emit utf8, as it is mandated by the standard, but for html/xhtml, we use, in the default case, the input encoding for the output encoding. I think that is a mistake. It seems clear that i

Re: url protection

2022-08-03 Thread Patrice Dumas
On Wed, Aug 03, 2022 at 12:08:15PM -0700, Per Bothner wrote: > On 8/3/22 06:26, Patrice Dumas wrote: > > The standard does not seems to clear on the encoding to use for the % > > encodings. URI::Escape has uri_escape() and uri_escape_utf8. My > > feeling is that the best would be to use first enc

Re: url protection

2022-08-03 Thread Per Bothner
On 8/3/22 06:26, Patrice Dumas wrote: The standard does not seems to clear on the encoding to use for the % encodings. URI::Escape has uri_escape() and uri_escape_utf8. My feeling is that the best would be to use first encode to the output encoding and then call URI::Escape uri_escape(). If I

url protection

2022-08-03 Thread Patrice Dumas
Hello, In general, hrefs generated by texi2any to Texinfo manuals, be it the current manual or external manual, only contain ascii characters acceptable in hrefs. However, for some other href, for file names, or from @url{}, there could be any characters. I think that it would be cleaner to per