I think I just found the solution.
Would the right strategy be to store the original XML content and then use a
solr.HTMLStripCharFilterFactory when querying? I just made a quick test and
it work,
the only problem now is that it also finds the data contained in the XML
attribute fields.
I think I
I did some experiments but I think I will end up with the doubled disk space.
The Problem is the following: I will search in the fulltext (without the xml
content), but I need to know the
position of the search result in the fulltext (to display) and in the XML
data (to get the attributes associa
Hi,
I have a Problem when using PatternReplaceCharFilter when indexing a field.
I created the following field:
-->
And I created a field that is indexed and stored:
I need to index a do
Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions="true" and
optionally fillCharacter=" ".
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match
Thanks again for your input.
In fact I already preprocess the data (concatenation of only the content
values) and index it into another field.
But my general problem is the following: My data has such a cryptic format
and I have to search only within the content values. Therefore I preprocess
it