Re: Upgraded to 4.10.3, highlighting performance unusably slow

Joel Bernstein Sat, 02 May 2015 17:22:27 -0700

Hi,

Can you also include the details of your research that narrowed the issue
to the highlighter?


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) <
michael.r...@lexisnexis.com> wrote:

> Are you able to identify if there is a particular part of the code that is
> slow?
>
> A simple way to do this is to use the jstack command (assuming your server
> has the full JDK installed). You can run it like this:
> /path/to/java/bin/jstack PID
>
> If you run that a bunch of times while your highlight query is running,
> you might be able to spot the hotspot. Usually I'll do something like this
> to see the stacktrace for the thread running the query:
> /path/to/java/bin/jstack PID | grep SearchHandler -B30
>
> A few more questions:
> - What are response times you are seeing before and after the upgrade? Is
> "unusably slow" 1 second, 10 seconds...?
> - If you run the exact same query multiple times, is it consistently slow?
> Or is it only slow on the first run?
> - While the query is running, do you see high user CPU on your server, or
> high IO wait, or both? (You can check this with the top command or vmstat
> command in Linux.)
>
> -Michael
>
> -----Original Message-----
> From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
> Sent: Saturday, May 02, 2015 4:13 PM
> To: solr-user@lucene.apache.org
> Subject: Upgraded to 4.10.3, highlighting performance unusably slow
>
> Hello,
>
> We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
> caused a incredible slowdown in our searches. We were able to narrow it
> down to the highlighting. The slowdown is extreme enough that we are
> holding back our release until we can resolve this.  Our research indicated
> using TermVectors & FastHighlighter were the way to go, however this still
> does nothing for the performance. I think we may be overlooking a crucial
> configuration, but cannot figure it out. I was hoping for some guidance and
> help. Sorry for the long email, I wanted to provide enough information.
>
> Our documents are largely dynamic fields, and so we have been using ‘*’ as
> the field for highlighting. This is the same setting as in prior versions
> of solr use. The dynamic fields are of type ’text’ and we added
> customizations to the schema.xml for the type ’text’:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> storeOffsetsWithPositions="true" termVectors="true" termPositions="true"
> termOffsets="true">
>   <analyzer type="index">
>     <!--  this charFilter removes all xml-tagging from the text: -->
>     <charFilter class="solr.HTMLStripCharFilterFactory"/>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <!-- Case insensitive stop word removal.
>       add enablePositionIncrements=true in both the index and query
>       analyzers to leave a 'gap' for more accurate phrase queries.
>     -->
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>   </analyzer>
>   <analyzer type="query">
>     <!--  this charFilter removes all xml-tagging from the text. Needed
> also in query due to autosuggest -->
>     <charFilter class="solr.HTMLStripCharFilterFactory"/>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>   </analyzer>
> </fieldType>
>
> One of the two dynamic fields we use:
>
> <dynamicField name="DTPropValue_*"  type="text"    indexed="true"
> stored="true" required="false" multiValued="true"/>
>
> In our solrConfig.xml file, we have:
>
> <requestHandler name="/eiHandler" class="solr.SearchHandler"> <lst
> name="defaults"> <str name="echoParams">explicit</str>
>      <int name="rows">13</int>
>      <bool name="tv">true</bool>
>      <bool name="hl.useFastVectorHighligter">true</bool>
>    </lst>
> <arr name="last-components">
> <str>tvComponent</str>
> </arr>
> </requestHandler>
> <searchComponent name="tvComponent" class="solr.TermVectorComponent”/>
> <searchComponent class="solr.HighlightComponent" name="highlight">
>   <highlighting>
>     <fragmenter name="gap" default="true"
> class="solr.highlight.GapFragmenter">
>       <lst name="defaults">
>         <int name="hl.fragsize">100</int>
>       </lst>
>     </fragmenter>
>     <fragmenter name="regex" class="solr.highlight.RegexFragmenter">
>       <lst name="defaults">
>         <int name="hl.fragsize">70</int>
>         <float name="hl.regex.slop">0.5</float>
>         <str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str>
>       </lst>
>     </fragmenter>
>
>     <formatter name="html" default="true"
> class="solr.highlight.HtmlFormatter">
>       <lst name="defaults">
>         <str name="hl.simple.pre"><![CDATA[<i>]]></str>
>         <str name="hl.simple.post"><![CDATA[</i>]]></str>
>       </lst>
>     </formatter>
>
>     <encoder name="html" class="solr.highlight.HtmlEncoder" />
>     <fragListBuilder name="simple"
> class="solr.highlight.SimpleFragListBuilder"/>
>     <fragListBuilder name="single"
> class="solr.highlight.SingleFragListBuilder"/>
>     <fragListBuilder name="weighted" default="true"
> class="solr.highlight.WeightedFragListBuilder"/>
>     <fragmentsBuilder name="default" default="true"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
>     </fragmentsBuilder>
>
>     <!-- multi-colored tag FragmentsBuilder -->
>     <fragmentsBuilder name="colored"
> class="solr.highlight.ScoreOrderFragmentsBuilder">
>       <lst name="defaults">
>         <str name="hl.tag.pre"><![CDATA[
>              <b style="background:yellow">,<b style="background:lawgreen">,
>              <b style="background:aquamarine">,<b
> style="background:magenta">,
>              <b style="background:palegreen">,<b style="background:coral">,
>              <b style="background:wheat">,<b style="background:khaki">,
>              <b style="background:lime">,<b
> style="background:deepskyblue">]]></str>
>         <str name="hl.tag.post"><![CDATA[</b>]]></str>
>       </lst>
>     </fragmentsBuilder>
>
>     <boundaryScanner name="default" default="true"
> class="solr.highlight.SimpleBoundaryScanner">
>       <lst name="defaults">
>         <str name="hl.bs.maxScan">10</str>
>         <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
>       </lst>
>     </boundaryScanner>
>
>     <boundaryScanner name="breakIterator"
> class="solr.highlight.BreakIteratorBoundaryScanner">
>       <lst name="defaults">
>         <str name="hl.bs.type">WORD</str>
>         <str name="hl.bs.language">en</str>
>         <str name="hl.bs.country">US</str>
>       </lst>
>     </boundaryScanner>
>   </highlighting>
> </searchComponent>
>
> And in our code:
>
> final SolrQuery query = new SolrQuery( luceneQueryStr );
> query.setRequestHandler("/eiHandler");
> query.setStart( request.getStartIndex() ); query.setRows(
> request.getMaxResults() ); query.setSort(new
> SortClause(request.getSortOrder().getFieldName(),
> request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) );
> query.addHighlightField( "*" ); query.setFields( "*", "score" );
>
> Any assistance is greatly appreciated.  Thank you.
>
> Sincerely,
> Sophia
>

Re: Upgraded to 4.10.3, highlighting performance unusably slow

Reply via email to