Just count the character in the literal portions of the patterns and include that spaces in the replacement.

So, "<TextLine " would become "          ".

It gets trickier if names are variable length. But I'm sure you could come up with patterns to replace one, two, three, etc. char names with equivalent spaces.

But... if all of this is too difficult for you, some people might find it easier to preprocess the data before sending it to Solr.

I mean, do you really need to highlight the content in such a cryptic input format?

Ultimately you might be better off with a custom char filter - sometimes people can cope better with straight Java code than cryptic regular expression sequences.

-- Jack Krupansky

-----Original Message----- From: jasimop
Sent: Thursday, May 30, 2013 12:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Problem with PatternReplaceCharFilter

Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions="true" and
optionally fillCharacter=" ".
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match?

Well, the specific problem is, that when highlighting the term positions are
wrong and the
result is not a valid XML structure that I can handle.
I expect something like
<TextLine aa=&quot;bb&quot; cc=&quot;dd&quot; content=&quot;the content to
&lt;em>search</em> in" ee="ff" />
but I can
<Tex&lt;em>tLine</em>aa="bb" cc="dd" content="the content to <em>search</em>
in" ee="ff" />

Thanks for your help.



--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to