200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled?
Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han <khanuniver...@gmail.com> wrote: > Another insteresting thing is : all one word or more word queries including > phrase queries such as "barack obama" slower in shingle configuration. What > i am doing wrong ? without shingle "barack obama" Querytime 300ms with > shingle 780 ms.. > > > On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han <khanuniver...@gmail.com>wrote: > >> Hi, >> >> What is the difference between solr 3.3 and the trunk ? >> I will try 3.3 and let you know the results. >> >> >> Here the search handler: >> >> <requestHandler name="search" class="solr.SearchHandler" default="true"> >> <lst name="defaults"> >> <str name="echoParams">explicit</str> >> <int name="rows">10</int> >> <!--<str name="fq">category:vv</str>--> >> <str name="fq">mrank:[0 TO 100]</str> >> <str name="echoParams">explicit</str> >> <int name="rows">10</int> >> <str name="defType">edismax</str> >> <!--<str name="qf">title^0.05 url^1.2 content^1.7 >> m_title^10.0</str>--> >> <str name="qf">title^1.05 url^1.2 content^1.7 m_title^10.0</str> >> <!-- <str name="bf">recip(ee_score,-0.85,1,0.2)</str> --> >> <str name="pf">content^18.0 m_title^5.0</str> >> <int name="ps">1</int> >> <int name="qs">0</int> >> <str name="mm">2<-25%</str> >> <str name="spellcheck">true</str> >> <!--<str name="spellcheck.collate">true</str> --> >> <str name="spellcheck.count">5</str> >> <str name="spellcheck.dictionary">subobjective</str> >> <str name="spellcheck.onlyMorePopular">false</str> >> <str name="hl.tag.pre"><b></str> >> <str name="hl.tag.post"></b></str> >> <str name="hl.useFastVectorHighlighter">true</str> >> </lst> >> >> >> >> >> On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher <erik.hatc...@gmail.com>wrote: >> >>> I'm not sure what the issue could be at this point. I see you've got >>> qt=search - what's the definition of that request handler? >>> >>> What is the parsed query (from the debugQuery response)? >>> >>> Have you tried this with Solr 3.3 to see if there's any appreciable >>> difference? >>> >>> Erik >>> >>> On Aug 27, 2011, at 09:34 , Lord Khan Han wrote: >>> >>> > When grouping off the query time ie 3567 ms to 1912 ms . Grouping >>> > increasing the query time and make useless to cache. But same config >>> faster >>> > without shingle still. >>> > >>> > We have and head to head test this wednesday tihs commercial search >>> engine. >>> > So I am looking for all suggestions. >>> > >>> > >>> > >>> > On Sat, Aug 27, 2011 at 3:37 PM, Erik Hatcher <erik.hatc...@gmail.com >>> >wrote: >>> > >>> >> Please confirm is this is caused by grouping. Turn grouping off, >>> what's >>> >> query time like? >>> >> >>> >> >>> >> On Aug 27, 2011, at 07:27 , Lord Khan Han wrote: >>> >> >>> >>> On the other hand We couldnt use the cache for below types queries. I >>> >> think >>> >>> its caused from grouping. Anyway we need to be sub second without >>> cache. >>> >>> >>> >>> >>> >>> >>> >>> On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han < >>> khanuniver...@gmail.com >>> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> >>> >>>> Thanks for the reply. >>> >>>> >>> >>>> Here the solr log capture.: >>> >>>> >>> >>>> ****** >>> >>>> >>> >>>> >>> >> >>> hl.fragsize=100&spellcheck=true&spellcheck.q=XXXXX&group.limit=5&hl.simple.pre=<b>&hl.fl=content&spellcheck.collate=true&wt=javabin&hl=true&rows=20&version=2&fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,category&hl.snippets=3&start=0&q=%2BXXXX+-"XXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXXX"+-XXX+-"XXXXX"+-XXXX+-XXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXX+-"XXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXXX"+-"XXXXXX"+-XXXXXX+-XXXXX+-"XXXXX"+"XXXXX"+"XXXXX"+"XXXXXX"++&group.field=host&hl.simple.post=</b>&group=true&qt=search&fq=mrank:[0+TO+100]&fq=word_count:[70+TO+*] >>> >>>> ****** >>> >>>> >>> >>>> XXXX is the words. All phrases "xxxxx" has two words inside. >>> >>>> >>> >>>> The timing from the DebugQuery: >>> >>>> >>> >>>> <lst name="timing"> >>> >>>> <double name="time">8654.0</double> >>> >>>> <lst name="prepare"> >>> >>>> <double name="time">16.0</double> >>> >>>> <lst name="org.apache.solr.handler.component.QueryComponent"> >>> >>>> <double name="time">16.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.FacetComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.HighlightComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.StatsComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.SpellCheckComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.DebugComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> </lst> >>> >>>> <lst name="process"> >>> >>>> <double name="time">8638.0</double> >>> >>>> <lst name="org.apache.solr.handler.component.QueryComponent"> >>> >>>> <double name="time">4473.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.FacetComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.HighlightComponent"> >>> >>>> <double name="time">42.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.StatsComponent"> >>> >>>> <double name="time">0.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.SpellCheckComponent"> >>> >>>> <double name="time">1.0</double> >>> >>>> </lst> >>> >>>> <lst name="org.apache.solr.handler.component.DebugComponent"> >>> >>>> <double name="time">4122.0</double> >>> >>>> </lst> >>> >>>> >>> >>>> >>> >>>> The funny thing is if I removed the ShingleFilter from the below >>> >> "sh_text" >>> >>>> field and index normally the query time is half of the current >>> shingle >>> >> one >>> >>>> !. Shouldn't be shingled index better for such heavy 2 word phrases >>> >> search >>> >>>> ? I am confused. >>> >>>> >>> >>>> On the other hand One of the on the shelf big FAT companies search >>> >> engine >>> >>>> doing the same query same machine 0.7 / 0.8 secs without cache . I am >>> >>>> confident we can do better in solr but couldnt find the way at the >>> >> moment. >>> >>>> >>> >>>> thanks for helping.. >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Sat, Aug 27, 2011 at 2:46 AM, Erik Hatcher < >>> erik.hatc...@gmail.com >>> >>> wrote: >>> >>>> >>> >>>>> >>> >>>>> On Aug 26, 2011, at 17:49 , Lord Khan Han wrote: >>> >>>>>> We are indexing news document from the various sites. Currently we >>> >> have >>> >>>>>> 200K docs indexed. Total index size is 36 gig. There is also >>> >>>>> attachement to >>> >>>>>> the news (pdf -docs etc) So document size could be high (ie 10mb). >>> >>>>>> >>> >>>>>> We are using some complex queries which includes around 30 - 40 >>> terms >>> >>>>> per >>> >>>>>> query. %70 of this terms is two word phrases. We are using >>> >>>>>> with conjunction + and - to pinpoint exact result. >>> >>>>>> There is also grouping, dismax and boosting , Termvector HL . >>> >>>>> >>> >>>>> You're using a lot of componentry there, and have complex queries. >>> We >>> >>>>> need more details. >>> >>>>> >>> >>>>> Turn on debugQuery=true... what do the timings say for each >>> component? >>> >>>>> >>> >>>>>> Our problem is query times. Currently its around 6-7 secs. I know >>> our >>> >>>>> query >>> >>>>>> is little bit heavy but we want to improve query performance. I >>> >> believe >>> >>>>> we >>> >>>>>> can make it sub second but no succes at the moment. >>> >>>>> >>> >>>>> Please provide an example query or two (perhaps a full line logged >>> from >>> >>>>> Solr itself), and then let's see what debugQuery says about your >>> query >>> >> being >>> >>>>> parsed. >>> >>>>> >>> >>>>>> We tried to use shingle 2 word token it decreases the query >>> performcen >>> >>>>> !! We >>> >>>>>> assumed it will help the speed up phrases search.. >>> >>>>> >>> >>>>> Again, we'd need to see a parsed query to understand this deeper. >>> >>>>> >>> >>>>> Lots of synonym expansion? A parsed query will tell us. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>>> (using solr latest trunk and HW is pretty good, 32 core with 32 >>> gig >>> >>>>> ram) >>> >>>>>> >>> >>>>>> Here the field def: >>> >>>>>> >>> >>>>>> <fieldType name="sh_text" class="solr.TextField" >>> >>>>> positionIncrementGap="100" >>> >>>>>> autoGeneratePhraseQueries="true"> >>> >>>>>> <analyzer type="index"> >>> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> >>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> >>>>>> words="stopwords.txt" enablePositionIncrements="true" /> >>> >>>>>> <filter class="solr.WordDelimiterFilterFactory" >>> >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>> >>>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >>> >>>>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> >>> >>>>>> <filter class="solr.KeywordMarkerFilterFactory" >>> >>>>>> protected="protwords.txt"/> >>> >>>>>> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" >>> >>>>>> outputUnigrams="true"/> >>> >>>>>> </analyzer> >>> >>>>>> <analyzer type="query"> >>> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> >>>>>> <filter class="solr.SynonymFilterFactory" >>> >> synonyms="synonyms.txt" >>> >>>>>> ignoreCase="true" expand="true"/> >>> >>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> >>>>>> words="stopwords.txt" enablePositionIncrements="true" /> >>> >>>>>> <filter class="solr.WordDelimiterFilterFactory" >>> >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>> >>>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >>> >>>>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> >>> >>>>>> <filter class="solr.KeywordMarkerFilterFactory" >>> >>>>>> protected="protwords.txt"/> >>> >>>>>> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" >>> >>>>>> outputUnigrams="true"/> >>> >>>>>> </analyzer> >>> >>>>>> </fieldType> >>> >>>>>> >>> >>>>>> and >>> >>>>>> >>> >>>>>> <field name="content" type="sh_text" stored="true" indexed="true" >>> >>>>>> termVectors="true" termPositions="true" termOffsets="true"/> >>> >>>>> >>> >>>>> >>> >>>> >>> >> >>> >> >>> >>> >> >