Michael McCandless created LUCENE-9986: ------------------------------------------
Summary: Create a simple "real world" regexp benchmark Key: LUCENE-9986 URL: https://issues.apache.org/jira/browse/LUCENE-9986 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless For issues like LUCENE-9983, where we are struggling to decide which low-level optimizations to make for our (complicated!) {{determinize}} method, it would really help to have a large, real-world corpus of regexps to evaluate performance metrics of our automata operations, like CPU and HEAP required to parse the regexp and determinize. Does anyone know of such an existing, hopefully compatibly licensed, corpus? Probably we would add these benchmarks to {{luceneutil}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org