[jira] [Commented] (DOXIA-616) Markdown: Properly expose the language specified in fenced code blocks

ASF GitHub Bot (Jira) Tue, 29 Dec 2020 09:25:08 -0800


    [ 
https://issues.apache.org/jira/browse/DOXIA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256076#comment-17256076
 ]


ASF GitHub Bot commented on DOXIA-616:
--------------------------------------

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549781997



##########
File path: 
doxia-core/src/main/java/org/apache/maven/doxia/sink/impl/Xhtml5BaseSink.java
##########
@@ -1905,7 +1905,9 @@ private void inlineSemantics( SinkEventAttributes 
attributes, String semantic,
     {
         if ( attributes.containsAttribute( SinkEventAttributes.SEMANTICS, 
semantic ) )
         {
-            writeStartTag( tag );
+            SinkEventAttributes attributesNoSemantics = ( SinkEventAttributes 
) attributes.copyAttributes();
+            attributesNoSemantics.removeAttribute( 
SinkEventAttributes.SEMANTICS );

Review comment:
       Why is this necessary?

##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       Aren't they available when `verify` run because this is run after 
`install`?

##########
File path: 
doxia-core/src/main/java/org/apache/maven/doxia/parser/XhtmlBaseParser.java
##########
@@ -509,7 +509,8 @@ else if ( ( parser.getName().equals( 
HtmlMarkup.CODE.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.SAMP.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.TT.toString() ) ) )
         {
-            sink.inline( SinkEventAttributeSet.Semantics.CODE );
+            attribs.addAttributes( SinkEventAttributeSet.Semantics.CODE );

Review comment:
       Interesting, is this a semantical change?

##########
File path: 
doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the 
start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when 
slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or 
else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        
"^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + 
"(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata 
entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when 
slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or 
else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", 
"email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to 
Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, 
Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, 
"&apos;" );

Review comment:
       Does this address DOXIA-542?

##########
File path: doxia-modules/doxia-module-markdown/src/it/settings.xml
##########
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<settings>
+  <profiles>
+    <profile>
+      <id>it-repo</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <repositories>
+        <repository>
+          <id>1local.central</id>

Review comment:
       Is the preceding `1` intentional?

##########
File path: 
doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = 
PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | 
Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( 
com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, 
extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = 
com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new 
FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( 
MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add 
the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, 
Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( 
metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( 
metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) 
);
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );

Review comment:
       Is this is a superset of `escapeXml`?

##########
File path: 
doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = 
PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | 
Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( 
com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, 
extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = 
com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new 
FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( 
MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add 
the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, 
Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( 
metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( 
metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) 
);
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value, true ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its 
headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the 
header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock 
)

Review comment:
       Thid does not retain HTML style commments?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Markdown: Properly expose the language specified in fenced code blocks
> ----------------------------------------------------------------------
>
>                 Key: DOXIA-616
>                 URL: https://issues.apache.org/jira/browse/DOXIA-616
>             Project: Maven Doxia
>          Issue Type: Improvement
>          Components: Module - Markdown
>    Affects Versions: 1.8, 1.9, 1.9.1
>            Reporter: Bertrand Martin
>            Assignee: Michael Osipov
>            Priority: Major
>             Fix For: 1.9.2
>
>
> h1. Use Case
> Writers can specify the language used in a fenced code block (typically for 
> syntax highlighting), as in the example below:
> {code}
> ```java
> System.out.println("Beautiful\n");
> ```
> {code}
> Currently, the Doxia module for Markdown does not expose this information 
> ("java") in the produced HTML, so a Maven skin (or frontend renderer) cannot 
> leverage it.
> Produced HTML:
> {code:html}
> <div class="source"> <!-- No mention of Java :-( -->
> <pre>
> System.out.println("Beautiful\n");
> </pre>
> </div>
> {code}
> Wanted result:
> {code:html}
> <div class="source java"> <!-- :-) -->
> <pre>
> System.out.println("Beautiful\n");
> </pre>
> </div>
> {code}
> h1. Specification
> Un-comment this block:
> https://github.com/apache/maven-doxia/blob/c439714e8f4a9e86f9962ac6be9a0077ae9b4d30/doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/FlexmarkDoxiaNodeRenderer.java#L103
> This should do the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DOXIA-616) Markdown: Properly expose the language specified in fenced code blocks

Reply via email to