[GitHub] [lucene-solr] markharwood opened a new pull request #1489: RegEx querying - add support for Java’s predefined character classes like \d for digits

GitBox Wed, 06 May 2020 03:39:43 -0700


markharwood opened a new pull request #1489:
URL: https://github.com/apache/lucene-solr/pull/1489

Jira Issue [9336](https://issues.apache.org/jira/browse/LUCENE-9336)
proposes adding support for common regex character classes like `\w`.
This PR adds the code to RegExp.java and associated tests.

The implementation could have gone one of two ways:
1) Extend `Kind` to introduce new types for DIGIT/WHITESPACE etc and
corresponding case statements for each type to `make[Type]`, rendering
toString, toStringTree and toAutomaton or
2) Reuse existing Kinds like range etc by adding a simple piece of logic to
the parser to expand `\d` into the [documented
equivalent](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART)
ie `[0-9]`.

I went for option 2 which makes the code shorter/cleaner and the meaning of
expressions like `\d` more easily readable in the code. The downside is that
the `toString` representations of these inputs are not as succinct - rendering
the fully expanded character lists rather than the shorthand `\x` type inputs
that generated them.
Happy to change if we feel this is the wrong trade-off.

One other consideration is that the shorthand expressions list could perhaps
be made configurable e.g. `\h` might be shorthand used to represent hashtags of
the form `#\w*` if that was something users routinely searched for and wanted
to add to the regex vocabulary.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] markharwood opened a new pull request #1489: RegEx querying - add support for Java’s predefined character classes like \d for digits

Reply via email to