Re: return unaltered complete multivalued fields with Highlighted results

Jonathan Rochkind Thu, 02 Jun 2011 12:00:17 -0700

I could use this feature too, encourage you to submit a patch in JIRA.

I wouldn't call the param "preserveOrder" though -- what it's reallydoing is returning the whole entire field, with highlighting markers,not just "preserving order" of fragments. Not sure what to call it, butnot "preserveOrder".


On 6/2/2011 11:31 AM, alexei wrote:

Hi,

Here is the code for Solr 3.1 that will preserve all the text and will
disable sorting.

This goes in solrconfig.xml request handler config or which ever way you
pass params:
      <str name="hl.preserveOrder">true</str>

This line goes into HighlightParams class:
   public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder";

Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I
only added 3 if blocks):

   private void doHighlightingByHighlighter( Query query, SolrQueryRequest
req, NamedList docSummaries,
       int docId, Document doc, String fieldName ) throws IOException {
     SolrParams params = req.getParams();
     String[] docTexts = doc.getValues(fieldName);
     // according to Document javadoc, doc.getValues() never returns null.
check empty instead of null
     if (docTexts.length == 0) return;

     SolrIndexSearcher searcher = req.getSearcher();
     IndexSchema schema = searcher.getSchema();
     TokenStream tstream = null;
     int numFragments = getMaxSnippets(fieldName, params);
     boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName,
params);

     String[] summaries = null;
     List<TextFragment>  frags = new ArrayList<TextFragment>();

     TermOffsetsTokenStream tots = null; // to be non-null iff we're using
TermOffsets optimization
     try {
         TokenStream tvStream =
TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
         if (tvStream != null) {
           tots = new TermOffsetsTokenStream(tvStream);
         }
     }
     catch (IllegalArgumentException e) {
       // No problem. But we can't use TermOffsets optimization.
     }

     for (int j = 0; j<  docTexts.length; j++) {
       if( tots != null ) {
         // if we're using TermOffsets optimization, then get the next
         // field value's TokenStream (i.e. get field j's TokenStream) from
tots:
         tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
       } else {
         // fall back to analyzer
         tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]);
       }

       Highlighter highlighter;
       if
(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER,
"true"))) {
         // TODO: this is not always necessary - eventually we would like to
avoid this wrap
         //       when it is not needed.
         tstream = new CachingTokenFilter(tstream);

         // get highlighter
         highlighter = getPhraseHighlighter(query, fieldName, req,
(CachingTokenFilter) tstream);

         // after highlighter initialization, reset tstream since
construction of highlighter already used it
         tstream.reset();
       }
       else {
         // use "the old way"
         highlighter = getHighlighter(query, fieldName, req);
       }

       int maxCharsToAnalyze = params.getFieldInt(fieldName,
           HighlightParams.MAX_CHARS,
           Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE);
       if (maxCharsToAnalyze<  0) {
         highlighter.setMaxDocCharsToAnalyze(docTexts[j].length());
       } else {
         highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze);
       }

       try {
         TextFragment[] bestTextFragments =
highlighter.getBestTextFragments(tstream, docTexts[j],
mergeContiguousFragments, numFragments);
         for (int k = 0; k<  bestTextFragments.length; k++) {
           if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {      
                
                if ((bestTextFragments[k] != null) ){//&&
(bestTextFragments[k].getScore()>  0)) {
                  frags.add(bestTextFragments[k]);
                }
           }
           else {
                if ((bestTextFragments[k] != null)&&
(bestTextFragments[k].getScore()>  0)) {
                  frags.add(bestTextFragments[k]);
             }
           }
         }
       } catch (InvalidTokenOffsetsException e) {
         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
       }
     }
     // sort such that the fragments with the highest score come first
     if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
            Collections.sort(frags, new Comparator<TextFragment>() {
              public int compare(TextFragment arg0, TextFragment arg1) {
                return Math.round(arg1.getScore() - arg0.getScore());
              }
            });
     }

      // convert fragments back into text
      // TODO: we can include score and position information in output as
snippet attributes
     if (frags.size()>  0) {
       ArrayList<String>  fragTexts = new ArrayList<String>();
       for (TextFragment fragment: frags) {
         if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
                if ((fragment != null) ){//&&  (fragment.getScore()>  0)) {
                  fragTexts.add(fragment.toString());
                }
                if (fragTexts.size()>= numFragments) break;
         } else {
                if ((fragment != null)&&  (fragment.getScore()>  0)) {
                  fragTexts.add(fragment.toString());
                    }
                    if (fragTexts.size()>= numFragments) break;
         }
       }
       summaries = fragTexts.toArray(new String[0]);
       if (summaries.length>  0)
       docSummaries.add(fieldName, summaries);
     }
     // no summeries made, copy text from alternate field
     if (summaries == null || summaries.length == 0) {
       alternateField( docSummaries, params, doc, fieldName );
     }
   }


This seems to work for my purposes. If nobody has any issues with this code
perhaps it should be a patch?

Thanks,
Alexei


--
View this message in context: 
http://lucene.472066.n3.nabble.com/return-unaltered-complete-multivalued-fields-with-Highlighted-results-tp2967146p3015616.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: return unaltered complete multivalued fields with Highlighted results

Reply via email to