Re: Upgraded to 4.10.3, highlighting performance unusably slow

jaime spicciati Sun, 03 May 2015 09:19:34 -0700

We ran into this as well on 4.10.3 (not related to an upgrade). It was
identified during load testing when a small percentage of queries would
take more than 20 seconds to return. We were able to isolate it by
rerunning the same query multiple times and regardless of cache hits the
queries would still take a long time to return. We used this method to
narrow down the performance problem to a small number of very large records
(many many fields in a single record).


We fixed it by turning on hl.requireFieldMatch on the query so that only
fields that have an actual hit are passed through the highlighter.

Hopefully this helps,
Jaime Spicciati

On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein <[email protected]> wrote:

> Hi,
>
> Can you also include the details of your research that narrowed the issue
> to the highlighter?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) <
> [email protected]> wrote:
>
> > Are you able to identify if there is a particular part of the code that
> is
> > slow?
> >
> > A simple way to do this is to use the jstack command (assuming your
> server
> > has the full JDK installed). You can run it like this:
> > /path/to/java/bin/jstack PID
> >
> > If you run that a bunch of times while your highlight query is running,
> > you might be able to spot the hotspot. Usually I'll do something like
> this
> > to see the stacktrace for the thread running the query:
> > /path/to/java/bin/jstack PID | grep SearchHandler -B30
> >
> > A few more questions:
> > - What are response times you are seeing before and after the upgrade? Is
> > "unusably slow" 1 second, 10 seconds...?
> > - If you run the exact same query multiple times, is it consistently
> slow?
> > Or is it only slow on the first run?
> > - While the query is running, do you see high user CPU on your server, or
> > high IO wait, or both? (You can check this with the top command or vmstat
> > command in Linux.)
> >
> > -Michael
> >
> > -----Original Message-----
> > From: Cheng, Sophia Kuen [mailto:[email protected]]
> > Sent: Saturday, May 02, 2015 4:13 PM
> > To: [email protected]
> > Subject: Upgraded to 4.10.3, highlighting performance unusably slow
> >
> > Hello,
> >
> > We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
> > caused a incredible slowdown in our searches. We were able to narrow it
> > down to the highlighting. The slowdown is extreme enough that we are
> > holding back our release until we can resolve this.  Our research
> indicated
> > using TermVectors & FastHighlighter were the way to go, however this
> still
> > does nothing for the performance. I think we may be overlooking a crucial
> > configuration, but cannot figure it out. I was hoping for some guidance
> and
> > help. Sorry for the long email, I wanted to provide enough information.
> >
> > Our documents are largely dynamic fields, and so we have been using ‘*’
> as
> > the field for highlighting. This is the same setting as in prior versions
> > of solr use. The dynamic fields are of type ’text’ and we added
> > customizations to the schema.xml for the type ’text’:
> >
> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> > storeOffsetsWithPositions="true" termVectors="true" termPositions="true"
> > termOffsets="true">
> >   <analyzer type="index">
> >     <!--  this charFilter removes all xml-tagging from the text: -->
> >     <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >     <!-- Case insensitive stop word removal.
> >       add enablePositionIncrements=true in both the index and query
> >       analyzers to leave a 'gap' for more accurate phrase queries.
> >     -->
> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> >     <filter class="solr.LowerCaseFilterFactory"/>
> >     <filter class="solr.SnowballPorterFilterFactory" language="English"
> > protected="protwords.txt"/>
> >   </analyzer>
> >   <analyzer type="query">
> >     <!--  this charFilter removes all xml-tagging from the text. Needed
> > also in query due to autosuggest -->
> >     <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> >     <filter class="solr.LowerCaseFilterFactory"/>
> >     <filter class="solr.SnowballPorterFilterFactory" language="English"
> > protected="protwords.txt"/>
> >   </analyzer>
> > </fieldType>
> >
> > One of the two dynamic fields we use:
> >
> > <dynamicField name="DTPropValue_*"  type="text"    indexed="true"
> > stored="true" required="false" multiValued="true"/>
> >
> > In our solrConfig.xml file, we have:
> >
> > <requestHandler name="/eiHandler" class="solr.SearchHandler"> <lst
> > name="defaults"> <str name="echoParams">explicit</str>
> >      <int name="rows">13</int>
> >      <bool name="tv">true</bool>
> >      <bool name="hl.useFastVectorHighligter">true</bool>
> >    </lst>
> > <arr name="last-components">
> > <str>tvComponent</str>
> > </arr>
> > </requestHandler>
> > <searchComponent name="tvComponent" class="solr.TermVectorComponent”/>
> > <searchComponent class="solr.HighlightComponent" name="highlight">
> >   <highlighting>
> >     <fragmenter name="gap" default="true"
> > class="solr.highlight.GapFragmenter">
> >       <lst name="defaults">
> >         <int name="hl.fragsize">100</int>
> >       </lst>
> >     </fragmenter>
> >     <fragmenter name="regex" class="solr.highlight.RegexFragmenter">
> >       <lst name="defaults">
> >         <int name="hl.fragsize">70</int>
> >         <float name="hl.regex.slop">0.5</float>
> >         <str name="hl.regex.pattern">[-\w
> ,/\n\&quot;&apos;]{20,200}</str>
> >       </lst>
> >     </fragmenter>
> >
> >     <formatter name="html" default="true"
> > class="solr.highlight.HtmlFormatter">
> >       <lst name="defaults">
> >         <str name="hl.simple.pre"><![CDATA[<i>]]></str>
> >         <str name="hl.simple.post"><![CDATA[</i>]]></str>
> >       </lst>
> >     </formatter>
> >
> >     <encoder name="html" class="solr.highlight.HtmlEncoder" />
> >     <fragListBuilder name="simple"
> > class="solr.highlight.SimpleFragListBuilder"/>
> >     <fragListBuilder name="single"
> > class="solr.highlight.SingleFragListBuilder"/>
> >     <fragListBuilder name="weighted" default="true"
> > class="solr.highlight.WeightedFragListBuilder"/>
> >     <fragmentsBuilder name="default" default="true"
> > class="solr.highlight.ScoreOrderFragmentsBuilder">
> >     </fragmentsBuilder>
> >
> >     <!-- multi-colored tag FragmentsBuilder -->
> >     <fragmentsBuilder name="colored"
> > class="solr.highlight.ScoreOrderFragmentsBuilder">
> >       <lst name="defaults">
> >         <str name="hl.tag.pre"><![CDATA[
> >              <b style="background:yellow">,<b
> style="background:lawgreen">,
> >              <b style="background:aquamarine">,<b
> > style="background:magenta">,
> >              <b style="background:palegreen">,<b
> style="background:coral">,
> >              <b style="background:wheat">,<b style="background:khaki">,
> >              <b style="background:lime">,<b
> > style="background:deepskyblue">]]></str>
> >         <str name="hl.tag.post"><![CDATA[</b>]]></str>
> >       </lst>
> >     </fragmentsBuilder>
> >
> >     <boundaryScanner name="default" default="true"
> > class="solr.highlight.SimpleBoundaryScanner">
> >       <lst name="defaults">
> >         <str name="hl.bs.maxScan">10</str>
> >         <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
> >       </lst>
> >     </boundaryScanner>
> >
> >     <boundaryScanner name="breakIterator"
> > class="solr.highlight.BreakIteratorBoundaryScanner">
> >       <lst name="defaults">
> >         <str name="hl.bs.type">WORD</str>
> >         <str name="hl.bs.language">en</str>
> >         <str name="hl.bs.country">US</str>
> >       </lst>
> >     </boundaryScanner>
> >   </highlighting>
> > </searchComponent>
> >
> > And in our code:
> >
> > final SolrQuery query = new SolrQuery( luceneQueryStr );
> > query.setRequestHandler("/eiHandler");
> > query.setStart( request.getStartIndex() ); query.setRows(
> > request.getMaxResults() ); query.setSort(new
> > SortClause(request.getSortOrder().getFieldName(),
> > request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) );
> > query.addHighlightField( "*" ); query.setFields( "*", "score" );
> >
> > Any assistance is greatly appreciated.  Thank you.
> >
> > Sincerely,
> > Sophia
> >
>

Re: Upgraded to 4.10.3, highlighting performance unusably slow

Reply via email to