However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks!
- Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: > Correction - it went from 17 seconds to 10 seconds - I was changing the > hl.regex.maxAnalyzedChars the first time. > Thanks! > > -Peter > > On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: > >> On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: >> >>> did you already try other values for hl.maxAnalyzedChars=2147483647 >> >> Yes, I tried dropping it down to 21, but it didn't have much of an impact >> (one search I just tried went from 17 seconds to 15.8 seconds, and this is >> an 8-core Mac Pro with 6GB RAM - 4GB for java). >> >>> ? Also regular expression highlighting is more expensive, I think. >>> What does the 'fuzzy' variable mean? If you use this to query via >>> "~someTerm" instead "someTerm" >>> then you should try the trunk of solr which is a lot faster for fuzzy or >>> other wildcard search. >> >> "fuzzy" could be set to "*" but isn't right now. >> >> Thanks for the tips, Peter - this has been very frustrating! >> >> >> - Peter >> >>> Regards, >>> Peter. >>> >>>> Data set: About 4,000 log files (will eventually grow to millions). >>>> Average log file is 850k. Largest log file (so far) is about 70MB. >>>> >>>> Problem: When I search for common terms, the query time goes from under >>>> 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I >>>> disable highlighting, performance improves a lot, but is still slow for >>>> some queries (7 seconds). Thanks in advance for any ideas! >>>> >>>> >>>> -Peter >>>> >>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> 4GB RAM server >>>> % java -Xms2048M -Xmx3072M -jar start.jar >>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> schema.xml changes: >>>> >>>> <fieldType name="text_pl" class="solr.TextField"> >>>> <analyzer> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" >>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0" >>>> catenateAll="0" splitOnCaseChange="0"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> ... >>>> >>>> <field name="body" type="text_pl" indexed="true" stored="true" >>>> multiValued="false" termVectors="true" termPositions="true" >>>> termOffsets="true" /> >>>> <field name="timestamp" type="date" indexed="true" stored="true" >>>> default="NOW" multiValued="false"/> >>>> <field name="version" type="string" indexed="true" stored="true" >>>> multiValued="false"/> >>>> <field name="device" type="string" indexed="true" stored="true" >>>> multiValued="false"/> >>>> <field name="filename" type="string" indexed="true" stored="true" >>>> multiValued="false"/> >>>> <field name="filesize" type="long" indexed="true" stored="true" >>>> multiValued="false"/> >>>> <field name="pversion" type="int" indexed="true" stored="true" >>>> multiValued="false"/> >>>> <field name="first2md5" type="string" indexed="false" stored="true" >>>> multiValued="false"/> >>>> <field name="ckey" type="string" indexed="true" stored="true" >>>> multiValued="false"/> >>>> >>>> ... >>>> >>>> <dynamicField name="*" type="ignored" multiValued="true" /> >>>> <defaultSearchField>body</defaultSearchField> >>>> <solrQueryParser defaultOperator="AND"/> >>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> solrconfig.xml changes: >>>> >>>> <maxFieldLength>2147483647</maxFieldLength> >>>> <ramBufferSizeMB>128</ramBufferSizeMB> >>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> The query: >>>> >>>> rowStr = "&rows=10" >>>> facet = >>>> "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version" >>>> fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey" >>>> termvectors = "&tv=true&qt=tvrh&tv.all=true" >>>> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" >>>> regexv = "(?m)^.*\n.*\n.*$" >>>> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + >>>> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" >>>> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, >>>> '').gsub(/([:~!<>="])/,'\\\\\1') + fuzzy + minLogSizeStr) >>>> >>>> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' >>>> : ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors >>>> + hl + hl_regex >>>> >>>> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + >>>> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >>>> >>>> >>>> >>> >>> >>> -- >>> http://karussell.wordpress.com/ >>> >> >