markharwood opened a new pull request #1489: URL: https://github.com/apache/lucene-solr/pull/1489
Jira Issue [9336](https://issues.apache.org/jira/browse/LUCENE-9336) proposes adding support for common regex character classes like `\w`. This PR adds the code to RegExp.java and associated tests. The implementation could have gone one of two ways: 1) Extend `Kind` to introduce new types for DIGIT/WHITESPACE etc and corresponding case statements for each type to `make[Type]`, rendering toString, toStringTree and toAutomaton or 2) Reuse existing Kinds like range etc by adding a simple piece of logic to the parser to expand `\d` into the [documented equivalent](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART) ie `[0-9]`. I went for option 2 which makes the code shorter/cleaner and the meaning of expressions like `\d` more easily readable in the code. The downside is that the `toString` representations of these inputs are not as succinct - rendering the fully expanded character lists rather than the shorthand `\x` type inputs that generated them. Happy to change if we feel this is the wrong trade-off. One other consideration is that the shorthand expressions list could perhaps be made configurable e.g. `\h` might be shorthand used to represent hashtags of the form `#\w*` if that was something users routinely searched for and wanted to add to the regex vocabulary. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org