RE: Strange regex behavior in solr.PatternReplaceCharFilterFactory

2019-09-27 Thread Webster Homer
: Friday, September 27, 2019 2:47 PM To: solr-user@lucene.apache.org Subject: Re: Strange regex behavior in solr.PatternReplaceCharFilterFactory Solr’s pattern replace _is_ Java’s. See PatternReplaceCharFilter. You’ll see: private final Pattern pattern; and later: final Matcher m = pattern.matcher(i

Re: Strange regex behavior in solr.PatternReplaceCharFilterFactory

2019-09-27 Thread Erick Erickson
Solr’s pattern replace _is_ Java’s. See PatternReplaceCharFilter. You’ll see: private final Pattern pattern; and later: final Matcher m = pattern.matcher(input); That said, there’s some manipulation after that, so there’s always room for issues. But I’d try just a standard Java program with yo

Re: Strange regex behavior in solr.PatternReplaceCharFilterFactory

2019-09-27 Thread Jörn Franke
Check the log files on the collection reload. About your regex: check a web page that checks Java regexes - there can be subtle differences between Java, JavaScript, php etc. Then it could be that your original text is not UTF-8 encoded, but Windows or similar. Check also if you have special cha

RE: Strange regex behavior in solr.PatternReplaceCharFilterFactory

2019-09-27 Thread Webster Homer
I forgot to mention that I'm using Solr 7.2. I also found that if instead of \p{L} I use the long form \p{Letter} then when I reload the collection after updating the schema, Solr will not load the collection. I think that Solr's regex support is not standard Java 8 -Original Message-