Hi, Can you also include the details of your research that narrowed the issue to the highlighter?
Joel Bernstein http://joelsolr.blogspot.com/ On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) < michael.r...@lexisnexis.com> wrote: > Are you able to identify if there is a particular part of the code that is > slow? > > A simple way to do this is to use the jstack command (assuming your server > has the full JDK installed). You can run it like this: > /path/to/java/bin/jstack PID > > If you run that a bunch of times while your highlight query is running, > you might be able to spot the hotspot. Usually I'll do something like this > to see the stacktrace for the thread running the query: > /path/to/java/bin/jstack PID | grep SearchHandler -B30 > > A few more questions: > - What are response times you are seeing before and after the upgrade? Is > "unusably slow" 1 second, 10 seconds...? > - If you run the exact same query multiple times, is it consistently slow? > Or is it only slow on the first run? > - While the query is running, do you see high user CPU on your server, or > high IO wait, or both? (You can check this with the top command or vmstat > command in Linux.) > > -Michael > > -----Original Message----- > From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] > Sent: Saturday, May 02, 2015 4:13 PM > To: solr-user@lucene.apache.org > Subject: Upgraded to 4.10.3, highlighting performance unusably slow > > Hello, > > We recently upgraded solr from 3.8.0 to 4.10.3. We saw that this upgrade > caused a incredible slowdown in our searches. We were able to narrow it > down to the highlighting. The slowdown is extreme enough that we are > holding back our release until we can resolve this. Our research indicated > using TermVectors & FastHighlighter were the way to go, however this still > does nothing for the performance. I think we may be overlooking a crucial > configuration, but cannot figure it out. I was hoping for some guidance and > help. Sorry for the long email, I wanted to provide enough information. > > Our documents are largely dynamic fields, and so we have been using ‘*’ as > the field for highlighting. This is the same setting as in prior versions > of solr use. The dynamic fields are of type ’text’ and we added > customizations to the schema.xml for the type ’text’: > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100" > storeOffsetsWithPositions="true" termVectors="true" termPositions="true" > termOffsets="true"> > <analyzer type="index"> > <!-- this charFilter removes all xml-tagging from the text: --> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- Case insensitive stop word removal. > add enablePositionIncrements=true in both the index and query > analyzers to leave a 'gap' for more accurate phrase queries. > --> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <!-- this charFilter removes all xml-tagging from the text. Needed > also in query due to autosuggest --> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > </fieldType> > > One of the two dynamic fields we use: > > <dynamicField name="DTPropValue_*" type="text" indexed="true" > stored="true" required="false" multiValued="true"/> > > In our solrConfig.xml file, we have: > > <requestHandler name="/eiHandler" class="solr.SearchHandler"> <lst > name="defaults"> <str name="echoParams">explicit</str> > <int name="rows">13</int> > <bool name="tv">true</bool> > <bool name="hl.useFastVectorHighligter">true</bool> > </lst> > <arr name="last-components"> > <str>tvComponent</str> > </arr> > </requestHandler> > <searchComponent name="tvComponent" class="solr.TermVectorComponent”/> > <searchComponent class="solr.HighlightComponent" name="highlight"> > <highlighting> > <fragmenter name="gap" default="true" > class="solr.highlight.GapFragmenter"> > <lst name="defaults"> > <int name="hl.fragsize">100</int> > </lst> > </fragmenter> > <fragmenter name="regex" class="solr.highlight.RegexFragmenter"> > <lst name="defaults"> > <int name="hl.fragsize">70</int> > <float name="hl.regex.slop">0.5</float> > <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> > </lst> > </fragmenter> > > <formatter name="html" default="true" > class="solr.highlight.HtmlFormatter"> > <lst name="defaults"> > <str name="hl.simple.pre"><![CDATA[<i>]]></str> > <str name="hl.simple.post"><![CDATA[</i>]]></str> > </lst> > </formatter> > > <encoder name="html" class="solr.highlight.HtmlEncoder" /> > <fragListBuilder name="simple" > class="solr.highlight.SimpleFragListBuilder"/> > <fragListBuilder name="single" > class="solr.highlight.SingleFragListBuilder"/> > <fragListBuilder name="weighted" default="true" > class="solr.highlight.WeightedFragListBuilder"/> > <fragmentsBuilder name="default" default="true" > class="solr.highlight.ScoreOrderFragmentsBuilder"> > </fragmentsBuilder> > > <!-- multi-colored tag FragmentsBuilder --> > <fragmentsBuilder name="colored" > class="solr.highlight.ScoreOrderFragmentsBuilder"> > <lst name="defaults"> > <str name="hl.tag.pre"><![CDATA[ > <b style="background:yellow">,<b style="background:lawgreen">, > <b style="background:aquamarine">,<b > style="background:magenta">, > <b style="background:palegreen">,<b style="background:coral">, > <b style="background:wheat">,<b style="background:khaki">, > <b style="background:lime">,<b > style="background:deepskyblue">]]></str> > <str name="hl.tag.post"><![CDATA[</b>]]></str> > </lst> > </fragmentsBuilder> > > <boundaryScanner name="default" default="true" > class="solr.highlight.SimpleBoundaryScanner"> > <lst name="defaults"> > <str name="hl.bs.maxScan">10</str> > <str name="hl.bs.chars">.,!? 	 </str> > </lst> > </boundaryScanner> > > <boundaryScanner name="breakIterator" > class="solr.highlight.BreakIteratorBoundaryScanner"> > <lst name="defaults"> > <str name="hl.bs.type">WORD</str> > <str name="hl.bs.language">en</str> > <str name="hl.bs.country">US</str> > </lst> > </boundaryScanner> > </highlighting> > </searchComponent> > > And in our code: > > final SolrQuery query = new SolrQuery( luceneQueryStr ); > query.setRequestHandler("/eiHandler"); > query.setStart( request.getStartIndex() ); query.setRows( > request.getMaxResults() ); query.setSort(new > SortClause(request.getSortOrder().getFieldName(), > request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) ); > query.addHighlightField( "*" ); query.setFields( "*", "score" ); > > Any assistance is greatly appreciated. Thank you. > > Sincerely, > Sophia >