Just count the character in the literal portions of the patterns and include
that spaces in the replacement.
So, "<TextLine " would become " ".
It gets trickier if names are variable length. But I'm sure you could come
up with patterns to replace one, two, three, etc. char names with equivalent
spaces.
But... if all of this is too difficult for you, some people might find it
easier to preprocess the data before sending it to Solr.
I mean, do you really need to highlight the content in such a cryptic input
format?
Ultimately you might be better off with a custom char filter - sometimes
people can cope better with straight Java code than cryptic regular
expression sequences.
-- Jack Krupansky
-----Original Message-----
From: jasimop
Sent: Thursday, May 30, 2013 12:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Problem with PatternReplaceCharFilter
Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions="true" and
optionally fillCharacter=" ".
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match?
Well, the specific problem is, that when highlighting the term positions are
wrong and the
result is not a valid XML structure that I can handle.
I expect something like
<TextLine aa="bb" cc="dd" content="the content to
<em>search</em> in" ee="ff" />
but I can
<Tex<em>tLine</em>aa="bb" cc="dd" content="the content to <em>search</em>
in" ee="ff" />
Thanks for your help.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html
Sent from the Solr - User mailing list archive at Nabble.com.