Robert Muir created LUCENE-9557:
-----------------------------------

             Summary: gradle regeneration of HTMLCharacterEntities.jflex should 
not use python2
                 Key: LUCENE-9557
                 URL: https://issues.apache.org/jira/browse/LUCENE-9557
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Robert Muir
         Attachments: LUCENE-9557.patch

I thought we had cleaned out the python2, but we got one straggler left.

Currently this is set to run with python2, but it should be using python3. 
Python3 will generate the exact same sources that are present in master today. 
But if you run it with python2 (as currently configured) it generates a 
slightly different grammar:

{noformat}
--- 
a/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex
+++ 
b/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex
@@ -60,7 +60,7 @@ CharacterEntities = ( "AElig" | "Aacute" | "Acirc" | "Agrave" 
| "Alpha"
                     | "times" | "trade" | "uArr" | "uacute" | "uarr" | "ucirc"
                     | "ugrave" | "uml" | "upsih" | "upsilon" | "uuml"
                     | "weierp" | "xi" | "yacute" | "yen" | "yuml" | "zeta"
-                    | "zwj" | "zwnj" )
+('                    | "zwj" | "zwnj"', ')')
{noformat}

This then cascades and causes HTMLStripCharFilter.java to be regenerated 
differently too with a different DFA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to