Michael McCandless created LUCENE-9986:
------------------------------------------

             Summary: Create a simple "real world" regexp benchmark
                 Key: LUCENE-9986
                 URL: https://issues.apache.org/jira/browse/LUCENE-9986
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless


For issues like LUCENE-9983, where we are struggling to decide which low-level 
optimizations to make for our (complicated!) {{determinize}} method, it would 
really help to have a large, real-world corpus of regexps to evaluate 
performance metrics of our automata operations, like CPU and HEAP required to 
parse the regexp and determinize.

Does anyone know of such an existing, hopefully compatibly licensed, corpus?

Probably we would add these benchmarks to {{luceneutil}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to