Thanks Erick.. If I figure out something I will let you know also.. No body replied except you I thought there might be more people involve here..
Thanks On Wed, Aug 31, 2011 at 3:47 AM, Erick Erickson <erickerick...@gmail.com>wrote: > OK, I'll have to defer because this makes no sense. > 4+ seconds in the debug component? > > Sorry I can't be more help here, but nothing really > jumps out. > Erick > > On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han <khanuniver...@gmail.com> > wrote: > > Below the output of the debug. I am measuring pure solr qtime which show > in > > the Qtime field in solr xml. > > > > <arr name="parsed_filter_queries"> > > <str>mrank:[0 TO 100]</str> > > </arr> > > <lst name="timing"> > > <double name="time">8584.0</double> > > <lst name="prepare"> > > <double name="time">12.0</double> > > <lst name="org.apache.solr.handler.component.QueryComponent"> > > <double name="time">12.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.FacetComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.HighlightComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.StatsComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.SpellCheckComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.DebugComponent"> > > <double name="time">0.0</double> > > </lst> > > </lst> > > <lst name="process"> > > <double name="time">8572.0</double> > > <lst name="org.apache.solr.handler.component.QueryComponent"> > > <double name="time">4480.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.FacetComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.HighlightComponent"> > > <double name="time">41.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.StatsComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.SpellCheckComponent"> > > <double name="time">0.0</double> > > </lst> > > <lst name="org.apache.solr.handler.component.DebugComponent"> > > <double name="time">4051.0</double> > > </lst> > > > > On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > >> Can we see the output if you specify both > >> &debugQuery=on&debug=true > >> > >> the debug=true will show the time taken up with various > >> components, which is sometimes surprising... > >> > >> Second, we never asked the most basic question, what are > >> you measuring? Is this the QTime of the returned response? > >> (which is the time actually spent searching) or the time until > >> the response gets back to the client, which may involve lots besides > >> searching... > >> > >> Best > >> Erick > >> > >> On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han <khanuniver...@gmail.com > > > >> wrote: > >> > Hi Eric, > >> > > >> > Fields are lazy loading, content stored in solr and machine 32 gig.. > solr > >> > has 20 gig heap. There is no swapping. > >> > > >> > As you see we have many phrases in the same query . I couldnt find a > way > >> to > >> > drop qtime to subsecends. Suprisingly non shingled test better qtime ! > >> > > >> > > >> > On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson < > erickerick...@gmail.com > >> >wrote: > >> > > >> >> Oh, one other thing: have you profiled your machine > >> >> to see if you're swapping? How much memory are > >> >> you giving your JVM? What is the underlying > >> >> hardware setup? > >> >> > >> >> Best > >> >> Erick > >> >> > >> >> On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson < > >> erickerick...@gmail.com> > >> >> wrote: > >> >> > 200K docs and 36G index? It sounds like you're storing > >> >> > your documents in the Solr index. In and of itself, that > >> >> > shouldn't hurt your query times, *unless* you have > >> >> > lazy field loading turned off, have you checked that > >> >> > lazy field loading is enabled? > >> >> > > >> >> > > >> >> > > >> >> > Best > >> >> > Erick > >> >> > > >> >> > On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han < > >> khanuniver...@gmail.com> > >> >> wrote: > >> >> >> Another insteresting thing is : all one word or more word queries > >> >> including > >> >> >> phrase queries such as "barack obama" slower in shingle > >> configuration. > >> >> What > >> >> >> i am doing wrong ? without shingle "barack obama" Querytime 300ms > >> with > >> >> >> shingle 780 ms.. > >> >> >> > >> >> >> > >> >> >> On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han < > >> khanuniver...@gmail.com > >> >> >wrote: > >> >> >> > >> >> >>> Hi, > >> >> >>> > >> >> >>> What is the difference between solr 3.3 and the trunk ? > >> >> >>> I will try 3.3 and let you know the results. > >> >> >>> > >> >> >>> > >> >> >>> Here the search handler: > >> >> >>> > >> >> >>> <requestHandler name="search" class="solr.SearchHandler" > >> >> default="true"> > >> >> >>> <lst name="defaults"> > >> >> >>> <str name="echoParams">explicit</str> > >> >> >>> <int name="rows">10</int> > >> >> >>> <!--<str name="fq">category:vv</str>--> > >> >> >>> <str name="fq">mrank:[0 TO 100]</str> > >> >> >>> <str name="echoParams">explicit</str> > >> >> >>> <int name="rows">10</int> > >> >> >>> <str name="defType">edismax</str> > >> >> >>> <!--<str name="qf">title^0.05 url^1.2 content^1.7 > >> >> >>> m_title^10.0</str>--> > >> >> >>> <str name="qf">title^1.05 url^1.2 content^1.7 m_title^10.0</str> > >> >> >>> <!-- <str name="bf">recip(ee_score,-0.85,1,0.2)</str> --> > >> >> >>> <str name="pf">content^18.0 m_title^5.0</str> > >> >> >>> <int name="ps">1</int> > >> >> >>> <int name="qs">0</int> > >> >> >>> <str name="mm">2<-25%</str> > >> >> >>> <str name="spellcheck">true</str> > >> >> >>> <!--<str name="spellcheck.collate">true</str> --> > >> >> >>> <str name="spellcheck.count">5</str> > >> >> >>> <str name="spellcheck.dictionary">subobjective</str> > >> >> >>> <str name="spellcheck.onlyMorePopular">false</str> > >> >> >>> <str name="hl.tag.pre"><b></str> > >> >> >>> <str name="hl.tag.post"></b></str> > >> >> >>> <str name="hl.useFastVectorHighlighter">true</str> > >> >> >>> </lst> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher < > >> erik.hatc...@gmail.com > >> >> >wrote: > >> >> >>> > >> >> >>>> I'm not sure what the issue could be at this point. I see > you've > >> got > >> >> >>>> qt=search - what's the definition of that request handler? > >> >> >>>> > >> >> >>>> What is the parsed query (from the debugQuery response)? > >> >> >>>> > >> >> >>>> Have you tried this with Solr 3.3 to see if there's any > appreciable > >> >> >>>> difference? > >> >> >>>> > >> >> >>>> Erik > >> >> >>>> > >> >> >>>> On Aug 27, 2011, at 09:34 , Lord Khan Han wrote: > >> >> >>>> > >> >> >>>> > When grouping off the query time ie 3567 ms to 1912 ms . > >> Grouping > >> >> >>>> > increasing the query time and make useless to cache. But same > >> config > >> >> >>>> faster > >> >> >>>> > without shingle still. > >> >> >>>> > > >> >> >>>> > We have and head to head test this wednesday tihs commercial > >> search > >> >> >>>> engine. > >> >> >>>> > So I am looking for all suggestions. > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > On Sat, Aug 27, 2011 at 3:37 PM, Erik Hatcher < > >> >> erik.hatc...@gmail.com > >> >> >>>> >wrote: > >> >> >>>> > > >> >> >>>> >> Please confirm is this is caused by grouping. Turn grouping > >> off, > >> >> >>>> what's > >> >> >>>> >> query time like? > >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> On Aug 27, 2011, at 07:27 , Lord Khan Han wrote: > >> >> >>>> >> > >> >> >>>> >>> On the other hand We couldnt use the cache for below types > >> >> queries. I > >> >> >>>> >> think > >> >> >>>> >>> its caused from grouping. Anyway we need to be sub second > >> without > >> >> >>>> cache. > >> >> >>>> >>> > >> >> >>>> >>> > >> >> >>>> >>> > >> >> >>>> >>> On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han < > >> >> >>>> khanuniver...@gmail.com > >> >> >>>> >>> wrote: > >> >> >>>> >>> > >> >> >>>> >>>> Hi, > >> >> >>>> >>>> > >> >> >>>> >>>> Thanks for the reply. > >> >> >>>> >>>> > >> >> >>>> >>>> Here the solr log capture.: > >> >> >>>> >>>> > >> >> >>>> >>>> ****** > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >> > >> >> >>>> > >> >> > >> > hl.fragsize=100&spellcheck=true&spellcheck.q=XXXXX&group.limit=5&hl.simple.pre=<b>&hl.fl=content&spellcheck.collate=true&wt=javabin&hl=true&rows=20&version=2&fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,category&hl.snippets=3&start=0&q=%2BXXXX+-"XXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXXX"+-XXX+-"XXXXX"+-XXXX+-XXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXX+-"XXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXXX"+-"XXXXXX"+-XXXXXX+-XXXXX+-"XXXXX"+"XXXXX"+"XXXXX"+"XXXXXX"++&group.field=host&hl.simple.post=</b>&group=true&qt=search&fq=mrank:[0+TO+100]&fq=word_count:[70+TO+*] > >> >> >>>> >>>> ****** > >> >> >>>> >>>> > >> >> >>>> >>>> XXXX is the words. All phrases "xxxxx" has two words > inside. > >> >> >>>> >>>> > >> >> >>>> >>>> The timing from the DebugQuery: > >> >> >>>> >>>> > >> >> >>>> >>>> <lst name="timing"> > >> >> >>>> >>>> <double name="time">8654.0</double> > >> >> >>>> >>>> <lst name="prepare"> > >> >> >>>> >>>> <double name="time">16.0</double> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.QueryComponent"> > >> >> >>>> >>>> <double name="time">16.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.FacetComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> >> name="org.apache.solr.handler.component.MoreLikeThisComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> name="org.apache.solr.handler.component.HighlightComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.StatsComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> >> name="org.apache.solr.handler.component.SpellCheckComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.DebugComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst name="process"> > >> >> >>>> >>>> <double name="time">8638.0</double> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.QueryComponent"> > >> >> >>>> >>>> <double name="time">4473.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.FacetComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> >> name="org.apache.solr.handler.component.MoreLikeThisComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> name="org.apache.solr.handler.component.HighlightComponent"> > >> >> >>>> >>>> <double name="time">42.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.StatsComponent"> > >> >> >>>> >>>> <double name="time">0.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > >> >> name="org.apache.solr.handler.component.SpellCheckComponent"> > >> >> >>>> >>>> <double name="time">1.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> <lst > name="org.apache.solr.handler.component.DebugComponent"> > >> >> >>>> >>>> <double name="time">4122.0</double> > >> >> >>>> >>>> </lst> > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >>>> The funny thing is if I removed the ShingleFilter from the > >> below > >> >> >>>> >> "sh_text" > >> >> >>>> >>>> field and index normally the query time is half of the > >> current > >> >> >>>> shingle > >> >> >>>> >> one > >> >> >>>> >>>> !. Shouldn't be shingled index better for such heavy 2 > word > >> >> phrases > >> >> >>>> >> search > >> >> >>>> >>>> ? I am confused. > >> >> >>>> >>>> > >> >> >>>> >>>> On the other hand One of the on the shelf big FAT companies > >> >> search > >> >> >>>> >> engine > >> >> >>>> >>>> doing the same query same machine 0.7 / 0.8 secs without > cache > >> . > >> >> I am > >> >> >>>> >>>> confident we can do better in solr but couldnt find the way > at > >> >> the > >> >> >>>> >> moment. > >> >> >>>> >>>> > >> >> >>>> >>>> thanks for helping.. > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >>>> On Sat, Aug 27, 2011 at 2:46 AM, Erik Hatcher < > >> >> >>>> erik.hatc...@gmail.com > >> >> >>>> >>> wrote: > >> >> >>>> >>>> > >> >> >>>> >>>>> > >> >> >>>> >>>>> On Aug 26, 2011, at 17:49 , Lord Khan Han wrote: > >> >> >>>> >>>>>> We are indexing news document from the various sites. > >> >> Currently we > >> >> >>>> >> have > >> >> >>>> >>>>>> 200K docs indexed. Total index size is 36 gig. There is > >> also > >> >> >>>> >>>>> attachement to > >> >> >>>> >>>>>> the news (pdf -docs etc) So document size could be high > (ie > >> >> 10mb). > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> We are using some complex queries which includes around > 30 - > >> 40 > >> >> >>>> terms > >> >> >>>> >>>>> per > >> >> >>>> >>>>>> query. %70 of this terms is two word phrases. We are > using > >> >> >>>> >>>>>> with conjunction + and - to pinpoint exact result. > >> >> >>>> >>>>>> There is also grouping, dismax and boosting , Termvector > HL > >> . > >> >> >>>> >>>>> > >> >> >>>> >>>>> You're using a lot of componentry there, and have complex > >> >> queries. > >> >> >>>> We > >> >> >>>> >>>>> need more details. > >> >> >>>> >>>>> > >> >> >>>> >>>>> Turn on debugQuery=true... what do the timings say for > each > >> >> >>>> component? > >> >> >>>> >>>>> > >> >> >>>> >>>>>> Our problem is query times. Currently its around 6-7 > secs. I > >> >> know > >> >> >>>> our > >> >> >>>> >>>>> query > >> >> >>>> >>>>>> is little bit heavy but we want to improve query > >> performance. I > >> >> >>>> >> believe > >> >> >>>> >>>>> we > >> >> >>>> >>>>>> can make it sub second but no succes at the moment. > >> >> >>>> >>>>> > >> >> >>>> >>>>> Please provide an example query or two (perhaps a full > line > >> >> logged > >> >> >>>> from > >> >> >>>> >>>>> Solr itself), and then let's see what debugQuery says > about > >> your > >> >> >>>> query > >> >> >>>> >> being > >> >> >>>> >>>>> parsed. > >> >> >>>> >>>>> > >> >> >>>> >>>>>> We tried to use shingle 2 word token it decreases the > query > >> >> >>>> performcen > >> >> >>>> >>>>> !! We > >> >> >>>> >>>>>> assumed it will help the speed up phrases search.. > >> >> >>>> >>>>> > >> >> >>>> >>>>> Again, we'd need to see a parsed query to understand this > >> >> deeper. > >> >> >>>> >>>>> > >> >> >>>> >>>>> Lots of synonym expansion? A parsed query will tell us. > >> >> >>>> >>>>> > >> >> >>>> >>>>> > >> >> >>>> >>>>> > >> >> >>>> >>>>>> (using solr latest trunk and HW is pretty good, 32 core > >> with > >> >> 32 > >> >> >>>> gig > >> >> >>>> >>>>> ram) > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> Here the field def: > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> <fieldType name="sh_text" class="solr.TextField" > >> >> >>>> >>>>> positionIncrementGap="100" > >> >> >>>> >>>>>> autoGeneratePhraseQueries="true"> > >> >> >>>> >>>>>> <analyzer type="index"> > >> >> >>>> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> >> >>>> >>>>>> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> >> >>>> >>>>>> words="stopwords.txt" enablePositionIncrements="true" /> > >> >> >>>> >>>>>> <filter class="solr.WordDelimiterFilterFactory" > >> >> >>>> >>>>>> generateWordParts="1" generateNumberParts="1" > >> catenateWords="1" > >> >> >>>> >>>>>> catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > >> >> >>>> >>>>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> > >> >> >>>> >>>>>> <filter class="solr.KeywordMarkerFilterFactory" > >> >> >>>> >>>>>> protected="protwords.txt"/> > >> >> >>>> >>>>>> <filter class="solr.ShingleFilterFactory" > >> >> maxShingleSize="2" > >> >> >>>> >>>>>> outputUnigrams="true"/> > >> >> >>>> >>>>>> </analyzer> > >> >> >>>> >>>>>> <analyzer type="query"> > >> >> >>>> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> >> >>>> >>>>>> <filter class="solr.SynonymFilterFactory" > >> >> >>>> >> synonyms="synonyms.txt" > >> >> >>>> >>>>>> ignoreCase="true" expand="true"/> > >> >> >>>> >>>>>> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> >> >>>> >>>>>> words="stopwords.txt" enablePositionIncrements="true" /> > >> >> >>>> >>>>>> <filter class="solr.WordDelimiterFilterFactory" > >> >> >>>> >>>>>> generateWordParts="1" generateNumberParts="1" > >> catenateWords="0" > >> >> >>>> >>>>>> catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1"/> > >> >> >>>> >>>>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> > >> >> >>>> >>>>>> <filter class="solr.KeywordMarkerFilterFactory" > >> >> >>>> >>>>>> protected="protwords.txt"/> > >> >> >>>> >>>>>> <filter class="solr.ShingleFilterFactory" > >> >> maxShingleSize="2" > >> >> >>>> >>>>>> outputUnigrams="true"/> > >> >> >>>> >>>>>> </analyzer> > >> >> >>>> >>>>>> </fieldType> > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> and > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> <field name="content" type="sh_text" stored="true" > >> >> indexed="true" > >> >> >>>> >>>>>> termVectors="true" termPositions="true" > termOffsets="true"/> > >> >> >>>> >>>>> > >> >> >>>> >>>>> > >> >> >>>> >>>> > >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> > >> >> >>>> > >> >> >>> > >> >> >> > >> >> > > >> >> > >> > > >> > > >