Thanks. I can see it both ways, but still lean towards it being a bug.
Headings in HTML can contain HTML and I don't think the user has any reason
to expect that the content of a heading would end up anywhere else. If it
were up to me the deciding factor would be whether the literal content of
the headings need to show up in HTML attributes (and I don't see why they
would). If it is necessary, then yes, it is up to the user to avoid the use
of @inlineraw headings (and a warning would be most welcome!). On the other
hand, if it is not necessary and that content doesn't need to show up in
attributes, or if an escaped version could be used instead, then the user
should be allowed to use @inlineraw here and a change to the code is
necessary to prevent noncompliant HTML output.

Benjamin Kalish

On Sun, Sep 22, 2024 at 6:12 PM Patrice Dumas <pertu...@free.fr> wrote:

> On Sun, Sep 22, 2024 at 05:27:34PM -0400, Benjamin Kalish wrote:
> > It looks like the problem occurs only with the use of raw HTML, directly
> > (as in the minimal example here), or indirectly through a macro (as I
> first
> > encountered it):
>
> I can reproduce with your example (in inline_in_chap.texi file), with
> USE_NODES set to 0 (as is done for epub):
>
> $srcdir/texi2any.pl --html -c 'USE_NODES 0' inline_in_chap.texi
>
> I am not sure that this is a bug, though, looks like a feature to me, as
> @inlineraw turns off escaping of HTML characters.  The situation is not
> ideal, because there is no way to specify something different for
> attributes (called 'string' context in texi2any) such as <meta>
> name="description" content attribute, and in the main output, for
> instance in the chapter heading <h2>.  When Texinfo code is used, the
> formatting to HTML can be different in 'string' context and in normal
> context, but I can't see how this could be specified for @inlineraw raw
> HTML.
>
> Maybe we could say something in the documentation, for instance that
> HTML elements should not be used in @inlineraw in @node or sectioning
> commands?
>
> > \input texinfo
> >
> > @node Top
> > @top
> >
> > @node Cap 1
> > @chapter @inlineraw{html,<span class="test">}One@inlineraw{html,</span>}
> >
> > @bye
> >
> > Benjamin Kalish
> >
> >
> > On Sun, Sep 22, 2024 at 4:47 PM Gavin Smith <gavinsmith0...@gmail.com>
> > wrote:
> >
> > > On Sun, Sep 22, 2024 at 02:20:36PM -0400, Benjamin Kalish wrote:
> > > > EPUB output contains unescaped content in a number of HTML
> attributes.
> > > I'm
> > > > seeing this with:
> > > >
> > > > - The content attribute for <meta> with name="description"
> > > > - The content attribute for <meta> name="keywords"
> > > > - The title attribute of the <link> elements with rel="next" and
> > > rel="prev"
> > > >
> > > > HTML output also has these same tags and attributes, but the content
> > > seems
> > > > fine in my case. This may not actually be due to better escaping, as
> it
> > > > looks like entirely different content is being used for the attribute
> > > > values when generating HTML, and the content is, in this case at
> least,
> > > > safe without escaping.
> > > >
> > > > Changing the values to be the same as those used when generating HTML
> > > would
> > > > solve the problem in my case, but it is probably best to make sure
> that
> > > > attribute values are always escaped.
> > > >
> > > > What should be escaped? Quotation marks must be. Ambiguous ampersands
> > > must
> > > > be. But it is probably prudent to escape all ampersands and all
> > > > occurrences of < or >.
> > > >
> > > > I'm sorry I can't suggest a fix in the code—I'm not familiar with the
> > > > Texinfo codebase and it's been decades since I've coded in Perl or C.
> > > >
> > > > I'm using texi2any 7.1.1
> > >
> > > I tried testing this on the master development branch and it looked
> > > ok:
> > >
> > > $ cat test.texi
> > > \input texinfo
> > >
> > > @node Top
> > > @top
> > >
> > > @node Cap 1
> > > @chapter One "<>
> > >
> > > @bye
> > >
> > > After running "texi2any --epub3 test.texi" and extracting the
> > > resulting "test.epub" file, the output file in the ZIP archive had, in
> > > "test/EPUB/xhtml/Cap-1.xhtml", the " < and > escaped (see output
> below).
> > > Can you please explain how to reproduce the problem?
> > >
> > >
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <!DOCTYPE html>
> > > <html xmlns="http://www.w3.org/1999/xhtml";>
> > > <!-- Created by GNU Texinfo 7.1.1,
> https://www.gnu.org/software/texinfo/
> > > -->
> > > <head>
> > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
> > > <title>1 One &quot;&lt;&gt; (Untitled Document)</title>
> > >
> > > <meta name="description" content="1 One &quot;&lt;&gt; (Untitled
> > > Document)"/>
> > > <meta name="keywords" content="1 One &quot;&lt;&gt; (Untitled
> Document)"/>
> > > <meta name="resource-type" content="document"/>
> > > <meta name="distribution" content="global"/>
> > > <meta name="Generator" content="texi2any"/>
> > > <meta name="viewport" content="width=device-width,initial-scale=1"/>
> > >
> > > <link href="test.xhtml" rel="start" title=""/>
> > > <link href="#Cap-1" rel="index" title="1 One &quot;&lt;&gt;"/>
> > > <link href="test.xhtml" rel="up" title=""/>
> > > <link href="test.xhtml" rel="prev" title=""/>
> > >
> > >
> > > </head>
> > >
> > > <body lang="en">
> > > <div class="chapter-level-extent" id="Cap-1">
> > >
> > > <h2 class="chapter" id="One-_0022_003c_003e">1 One &quot;&lt;&gt;</h2>
> > >
> > >
> > >
> > > </div>
> > >
> > >
> > >
> > > </body>
> > > </html>
> > >
>

Reply via email to