previously given pattern will solve the '<' char issue. however you will get following exception in the log

Caused by: java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 48
(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)
                                                ^
so revisit your regex pattern particularly position 48

-Jeevanandam


On 19-04-2012 7:06 pm, Jeevanandam wrote:
try this one


pattern="(?&lt;=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"

I tested locally, solr start perfectly. now please test with data.

-Jeevanandam


On 19-04-2012 9:29 am, smooth almonds wrote:
Using Solr 3.5.0 and in my schema.xml I'm using the following to mark the end
of sentences and replace the end punctuation with a symbolic token:

<charFilter class=&quot;solr.PatternReplaceCharFilterFactory&quot;

pattern=&quot;(?&lt;=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)&quot;
replacement=&quot; monkeysentence&quot;/>

I'm not sure if that will even work for what I want, but first I need to
solve the problem of escaping the '<' character in the first '?<='
lookbehind.

I get the following error:

org.xml.sax.SAXParseException: The value of attribute "pattern" associated
with an element type "null" must not contain the '<' character.

I've tried using a '\' as in:


pattern="(?\<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"

But I get the same error.

--
View this message in context:

http://lucene.472066.n3.nabble.com/How-to-escape-character-in-regex-in-Solr-schema-xml-tp3921961p3921961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to