Robert Muir created LUCENE-9557: ----------------------------------- Summary: gradle regeneration of HTMLCharacterEntities.jflex should not use python2 Key: LUCENE-9557 URL: https://issues.apache.org/jira/browse/LUCENE-9557 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-9557.patch
I thought we had cleaned out the python2, but we got one straggler left. Currently this is set to run with python2, but it should be using python3. Python3 will generate the exact same sources that are present in master today. But if you run it with python2 (as currently configured) it generates a slightly different grammar: {noformat} --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex @@ -60,7 +60,7 @@ CharacterEntities = ( "AElig" | "Aacute" | "Acirc" | "Agrave" | "Alpha" | "times" | "trade" | "uArr" | "uacute" | "uarr" | "ucirc" | "ugrave" | "uml" | "upsih" | "upsilon" | "uuml" | "weierp" | "xi" | "yacute" | "yen" | "yuml" | "zeta" - | "zwj" | "zwnj" ) +(' | "zwj" | "zwnj"', ')') {noformat} This then cascades and causes HTMLStripCharFilter.java to be regenerated differently too with a different DFA. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org