An ICIJ engineer, Julien Martin, has since developed a patch for this. We’d appreciate any feedback and attention that might help get this integrated: https://issues.apache.org/jira/browse/SOLR-1105 <https://issues.apache.org/jira/browse/SOLR-1105>
> On 1 Mar 2017, at 17:03, Matthew Caruana Galizia <mcaru...@icij.org > <mailto:mcaru...@icij.org>> wrote: > > We’re currently using copyField directives in our schema to copy the same > text to different fields that use different analysers. For example, assuming > the original field contained in the document payload sent to the update > handler is called “tika_output", it is copied to “text”, > “text_case_sensitive” and “text_accent_sensitive”. > > In order to avoid inflating the size of the index, “tika_output" has > indexed=false and stored=true, while “text” and friends have indexed=true and > stored=false. > > We’re using the unified highlighter. I’ve read the code in > UnifiedHighlighter.java, which clearly shows that the field to be highlighted > must be stored. Therefore, searching on text_case_sensitive doesn’t yield > highlighted results. Storing the field value redundantly would mean tripling > my storage costs. > > I see that other people have brought up this issue before: > > https://issues.apache.org/jira/browse/SOLR-1105 > <https://issues.apache.org/jira/browse/SOLR-1105> > https://issues.apache.org/jira/browse/SOLR-5276 > <https://issues.apache.org/jira/browse/SOLR-5276> > > Is there anything that can be done? If it comes down to subclassing the > unified highlighter, does anyone have any recommendations for doing this?