Re: Solr Reference Guide issue for simplified tokenizers

2018-04-16 Thread Nikolay Khitrin
Yes, Lucene RegExp javadoc seems a bit complicated and even tests do not cover all syntax variants. But the whole point is: parser doesn't mangle any characters and using backslashes only for distinguish syntax symbols from raw characters. The example might be not a best possible, but I think refe

Re: Solr Reference Guide issue for simplified tokenizers

2018-04-15 Thread Shawn Heisey
On 4/15/2018 5:42 AM, Nikolay Khitrin wrote: Given example is class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/> but Lucene's RegExp constructor consumes raw unicode characters instead of \t\r\n form, so correct configuration is Looks like you're right about that exampl