We’re currently using copyField directives in our schema to copy the same text 
to different fields that use different analysers. For example, assuming the 
original field contained in the document payload sent to the update handler is 
called “tika_output", it is copied to “text”, “text_case_sensitive” and 
“text_accent_sensitive”.

In order to avoid inflating the size of the index, “tika_output" has 
indexed=false and stored=true, while “text” and friends have indexed=true and 
stored=false.

We’re using the unified highlighter. I’ve read the code in 
UnifiedHighlighter.java, which clearly shows that the field to be highlighted 
must be stored. Therefore, searching on text_case_sensitive doesn’t yield 
highlighted results. Storing the field value redundantly would mean tripling my 
storage costs.

I see that other people have brought up this issue before:

https://issues.apache.org/jira/browse/SOLR-1105
https://issues.apache.org/jira/browse/SOLR-5276

Is there anything that can be done? If it comes down to subclassing the unified 
highlighter, does anyone have any recommendations for doing this?

Reply via email to