rmuir opened a new pull request #515:
URL: https://github.com/apache/lucene/pull/515


   This change uses a new jflex feature 
(https://github.com/jflex-de/jflex/pull/654) to simplify emoji processing in 
the grammar. We can do a set difference rather than workaround it with 
complement + demorgan stuff.
   
   It is cosmetic: doesn't change the resulting tokenizers (see diff), but 
makes the emoji parts easier to read.
   
   bonus: major speed up to regenerating that huge UAX29UrlEmail DFA.
   
   Before:
   ```
   > Task :lucene:analysis:common:generateUAX29URLEmailTokenizer
   Aggregate task times (possibly running in parallel!):
    918.87 sec.  generateUAX29URLEmailTokenizerInternal
   ```
   
   After:
   ```
   > Task :lucene:analysis:common:generateUAX29URLEmailTokenizer
   Aggregate task times (possibly running in parallel!):
    285.26 sec.  generateUAX29URLEmailTokenizerInternal
   ```
   
   This was suggested by jflex developers to help with the 
very-slow-regeneration on https://github.com/jflex-de/jflex/issues/715 . It 
doesn't solve all of our problems there, but it makes things a lot less painful 
:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to