Thanks for the full details -- being able to see exactly how the queries 
are recieved & parsed is important for rulling out simple things like 
client side escaping (or lack of) and server side metacharacter handling 
in the query parser.

: Some things work the way I'd expect, some clearly don't. So my question 
: was, in the first instance "Is there full regex support?" Clearly, 
: there's supposed to be, so something is wrong, or I don't know the right 
: escape syntax.

I think it depends on your definition of "full" ?

Based on the doc for the supported syntax, it doesn't look to me like 
there is any direct support for some any of the pre-defined character 
classes (ie: "\s", "\w", etc..) or boundary matchers (ie: "^", "\b", 
etc...)

https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/automaton/RegExp.html

It looks like there are hooks in the underlying RegExp and RegexpQuery 
classes for registering named Automotons so the "<...>" syntax can refer 
to named character classes that are defined at runtime, but none are 
registered for you by default, and there is no way to configure that with 
the QueryParser.

I'm sorry, but I don't understand the RegExp automoton stuff enough to 
understand why the predefined character classes from java.regex.Pattern 
aren't supported by default, or even how you could conceptually implement 
the boundary matchers using the underlying java API.

I suspect the existing regex query support isn't going to work for what 
you're trying to do.


-Hoss

Reply via email to