Re: Upgraded to 4.10.3, highlighting performance unusably slow

William Bell Mon, 11 May 2015 00:52:55 -0700

Has anyone looked at it?

On Sun, May 3, 2015 at 10:18 AM, jaime spicciati <jaime.spicci...@gmail.com>
wrote:


> We ran into this as well on 4.10.3 (not related to an upgrade). It was
> identified during load testing when a small percentage of queries would
> take more than 20 seconds to return. We were able to isolate it by
> rerunning the same query multiple times and regardless of cache hits the
> queries would still take a long time to return. We used this method to
> narrow down the performance problem to a small number of very large records
> (many many fields in a single record).
>
> We fixed it by turning on hl.requireFieldMatch on the query so that only
> fields that have an actual hit are passed through the highlighter.
>
> Hopefully this helps,
> Jaime Spicciati
>
> On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein <joels...@gmail.com> wrote:
>
> > Hi,
> >
> > Can you also include the details of your research that narrowed the issue
> > to the highlighter?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) <
> > michael.r...@lexisnexis.com> wrote:
> >
> > > Are you able to identify if there is a particular part of the code that
> > is
> > > slow?
> > >
> > > A simple way to do this is to use the jstack command (assuming your
> > server
> > > has the full JDK installed). You can run it like this:
> > > /path/to/java/bin/jstack PID
> > >
> > > If you run that a bunch of times while your highlight query is running,
> > > you might be able to spot the hotspot. Usually I'll do something like
> > this
> > > to see the stacktrace for the thread running the query:
> > > /path/to/java/bin/jstack PID | grep SearchHandler -B30
> > >
> > > A few more questions:
> > > - What are response times you are seeing before and after the upgrade?
> Is
> > > "unusably slow" 1 second, 10 seconds...?
> > > - If you run the exact same query multiple times, is it consistently
> > slow?
> > > Or is it only slow on the first run?
> > > - While the query is running, do you see high user CPU on your server,
> or
> > > high IO wait, or both? (You can check this with the top command or
> vmstat
> > > command in Linux.)
> > >
> > > -Michael
> > >
> > > -----Original Message-----
> > > From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
> > > Sent: Saturday, May 02, 2015 4:13 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Upgraded to 4.10.3, highlighting performance unusably slow
> > >
> > > Hello,
> > >
> > > We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this
> upgrade
> > > caused a incredible slowdown in our searches. We were able to narrow it
> > > down to the highlighting. The slowdown is extreme enough that we are
> > > holding back our release until we can resolve this.  Our research
> > indicated
> > > using TermVectors & FastHighlighter were the way to go, however this
> > still
> > > does nothing for the performance. I think we may be overlooking a
> crucial
> > > configuration, but cannot figure it out. I was hoping for some guidance
> > and
> > > help. Sorry for the long email, I wanted to provide enough information.
> > >
> > > Our documents are largely dynamic fields, and so we have been using ‘*’
> > as
> > > the field for highlighting. This is the same setting as in prior
> versions
> > > of solr use. The dynamic fields are of type ’text’ and we added
> > > customizations to the schema.xml for the type ’text’:
> > >
> > > <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100"
> > > storeOffsetsWithPositions="true" termVectors="true"
> termPositions="true"
> > > termOffsets="true">
> > >   <analyzer type="index">
> > >     <!--  this charFilter removes all xml-tagging from the text: -->
> > >     <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >     <!-- Case insensitive stop word removal.
> > >       add enablePositionIncrements=true in both the index and query
> > >       analyzers to leave a 'gap' for more accurate phrase queries.
> > >     -->
> > >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true"/>
> > >     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="0" splitOnCaseChange="1"/>
> > >     <filter class="solr.LowerCaseFilterFactory"/>
> > >     <filter class="solr.SnowballPorterFilterFactory" language="English"
> > > protected="protwords.txt"/>
> > >   </analyzer>
> > >   <analyzer type="query">
> > >     <!--  this charFilter removes all xml-tagging from the text. Needed
> > > also in query due to autosuggest -->
> > >     <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true"/>
> > >     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> > > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="1"/>
> > >     <filter class="solr.LowerCaseFilterFactory"/>
> > >     <filter class="solr.SnowballPorterFilterFactory" language="English"
> > > protected="protwords.txt"/>
> > >   </analyzer>
> > > </fieldType>
> > >
> > > One of the two dynamic fields we use:
> > >
> > > <dynamicField name="DTPropValue_*"  type="text"    indexed="true"
> > > stored="true" required="false" multiValued="true"/>
> > >
> > > In our solrConfig.xml file, we have:
> > >
> > > <requestHandler name="/eiHandler" class="solr.SearchHandler"> <lst
> > > name="defaults"> <str name="echoParams">explicit</str>
> > >      <int name="rows">13</int>
> > >      <bool name="tv">true</bool>
> > >      <bool name="hl.useFastVectorHighligter">true</bool>
> > >    </lst>
> > > <arr name="last-components">
> > > <str>tvComponent</str>
> > > </arr>
> > > </requestHandler>
> > > <searchComponent name="tvComponent" class="solr.TermVectorComponent”/>
> > > <searchComponent class="solr.HighlightComponent" name="highlight">
> > >   <highlighting>
> > >     <fragmenter name="gap" default="true"
> > > class="solr.highlight.GapFragmenter">
> > >       <lst name="defaults">
> > >         <int name="hl.fragsize">100</int>
> > >       </lst>
> > >     </fragmenter>
> > >     <fragmenter name="regex" class="solr.highlight.RegexFragmenter">
> > >       <lst name="defaults">
> > >         <int name="hl.fragsize">70</int>
> > >         <float name="hl.regex.slop">0.5</float>
> > >         <str name="hl.regex.pattern">[-\w
> > ,/\n\&quot;&apos;]{20,200}</str>
> > >       </lst>
> > >     </fragmenter>
> > >
> > >     <formatter name="html" default="true"
> > > class="solr.highlight.HtmlFormatter">
> > >       <lst name="defaults">
> > >         <str name="hl.simple.pre"><![CDATA[<i>]]></str>
> > >         <str name="hl.simple.post"><![CDATA[</i>]]></str>
> > >       </lst>
> > >     </formatter>
> > >
> > >     <encoder name="html" class="solr.highlight.HtmlEncoder" />
> > >     <fragListBuilder name="simple"
> > > class="solr.highlight.SimpleFragListBuilder"/>
> > >     <fragListBuilder name="single"
> > > class="solr.highlight.SingleFragListBuilder"/>
> > >     <fragListBuilder name="weighted" default="true"
> > > class="solr.highlight.WeightedFragListBuilder"/>
> > >     <fragmentsBuilder name="default" default="true"
> > > class="solr.highlight.ScoreOrderFragmentsBuilder">
> > >     </fragmentsBuilder>
> > >
> > >     <!-- multi-colored tag FragmentsBuilder -->
> > >     <fragmentsBuilder name="colored"
> > > class="solr.highlight.ScoreOrderFragmentsBuilder">
> > >       <lst name="defaults">
> > >         <str name="hl.tag.pre"><![CDATA[
> > >              <b style="background:yellow">,<b
> > style="background:lawgreen">,
> > >              <b style="background:aquamarine">,<b
> > > style="background:magenta">,
> > >              <b style="background:palegreen">,<b
> > style="background:coral">,
> > >              <b style="background:wheat">,<b style="background:khaki">,
> > >              <b style="background:lime">,<b
> > > style="background:deepskyblue">]]></str>
> > >         <str name="hl.tag.post"><![CDATA[</b>]]></str>
> > >       </lst>
> > >     </fragmentsBuilder>
> > >
> > >     <boundaryScanner name="default" default="true"
> > > class="solr.highlight.SimpleBoundaryScanner">
> > >       <lst name="defaults">
> > >         <str name="hl.bs.maxScan">10</str>
> > >         <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
> > >       </lst>
> > >     </boundaryScanner>
> > >
> > >     <boundaryScanner name="breakIterator"
> > > class="solr.highlight.BreakIteratorBoundaryScanner">
> > >       <lst name="defaults">
> > >         <str name="hl.bs.type">WORD</str>
> > >         <str name="hl.bs.language">en</str>
> > >         <str name="hl.bs.country">US</str>
> > >       </lst>
> > >     </boundaryScanner>
> > >   </highlighting>
> > > </searchComponent>
> > >
> > > And in our code:
> > >
> > > final SolrQuery query = new SolrQuery( luceneQueryStr );
> > > query.setRequestHandler("/eiHandler");
> > > query.setStart( request.getStartIndex() ); query.setRows(
> > > request.getMaxResults() ); query.setSort(new
> > > SortClause(request.getSortOrder().getFieldName(),
> > > request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) );
> > > query.addHighlightField( "*" ); query.setFields( "*", "score" );
> > >
> > > Any assistance is greatly appreciated.  Thank you.
> > >
> > > Sincerely,
> > > Sophia
> > >
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Upgraded to 4.10.3, highlighting performance unusably slow

Reply via email to