An ICIJ engineer, Julien Martin, has since developed a patch for this. We’d 
appreciate any feedback and attention that might help get this integrated: 
https://issues.apache.org/jira/browse/SOLR-1105 
<https://issues.apache.org/jira/browse/SOLR-1105>

> On 1 Mar 2017, at 17:03, Matthew Caruana Galizia <mcaru...@icij.org 
> <mailto:mcaru...@icij.org>> wrote:
> 
> We’re currently using copyField directives in our schema to copy the same 
> text to different fields that use different analysers. For example, assuming 
> the original field contained in the document payload sent to the update 
> handler is called “tika_output", it is copied to “text”, 
> “text_case_sensitive” and “text_accent_sensitive”.
> 
> In order to avoid inflating the size of the index, “tika_output" has 
> indexed=false and stored=true, while “text” and friends have indexed=true and 
> stored=false.
> 
> We’re using the unified highlighter. I’ve read the code in 
> UnifiedHighlighter.java, which clearly shows that the field to be highlighted 
> must be stored. Therefore, searching on text_case_sensitive doesn’t yield 
> highlighted results. Storing the field value redundantly would mean tripling 
> my storage costs.
> 
> I see that other people have brought up this issue before:
> 
> https://issues.apache.org/jira/browse/SOLR-1105 
> <https://issues.apache.org/jira/browse/SOLR-1105>
> https://issues.apache.org/jira/browse/SOLR-5276 
> <https://issues.apache.org/jira/browse/SOLR-5276>
> 
> Is there anything that can be done? If it comes down to subclassing the 
> unified highlighter, does anyone have any recommendations for doing this?



Reply via email to