Thanks again for your input. In fact I already preprocess the data (concatenation of only the content values) and index it into another field.
But my general problem is the following: My data has such a cryptic format and I have to search only within the content values. Therefore I preprocess it and put it into a field. There all works fine (highlighting etc.). The problem now comes from the fact that when getting a hit in that field I need to know the <TextLine> it appeared in to get the attribute values. They define some rules for processing the search result, but it should not be possible to search in them. Therefore I cannot just use the HtmlStripCharFilter. So my idea was the following: indexing my cleaned version and the raw format and make sure that both fields generate the same tokens (this is the hard part). If i need to know the surrounding attribute values i search in the raw version and highlight the matching term. This is the indication for me which attribute values to use. Another option would be to search in the cleaned version and after the search/in my application try to match that position to the one in the raw format based on the highlighted term. But this is very error prone. Both solutions do not seem elegant to me. Any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4067265.html Sent from the Solr - User mailing list archive at Nabble.com.