rmuir opened a new pull request #515: URL: https://github.com/apache/lucene/pull/515
This change uses a new jflex feature (https://github.com/jflex-de/jflex/pull/654) to simplify emoji processing in the grammar. We can do a set difference rather than workaround it with complement + demorgan stuff. It is cosmetic: doesn't change the resulting tokenizers (see diff), but makes the emoji parts easier to read. bonus: major speed up to regenerating that huge UAX29UrlEmail DFA. Before: ``` > Task :lucene:analysis:common:generateUAX29URLEmailTokenizer Aggregate task times (possibly running in parallel!): 918.87 sec. generateUAX29URLEmailTokenizerInternal ``` After: ``` > Task :lucene:analysis:common:generateUAX29URLEmailTokenizer Aggregate task times (possibly running in parallel!): 285.26 sec. generateUAX29URLEmailTokenizerInternal ``` This was suggested by jflex developers to help with the very-slow-regeneration on https://github.com/jflex-de/jflex/issues/715 . It doesn't solve all of our problems there, but it makes things a lot less painful :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org