Hi Peter :-),
did you already try other values for
hl.maxAnalyzedChars=2147483647
? Also regular expression highlighting is more expensive, I think.
What does the 'fuzzy' variable mean? If you use this to query via
"~someTerm" instead "someTerm"
then you should try the trunk of solr which is a lot faster for fuzzy or
other wildcard search.
Regards,
Peter.
> Data set: About 4,000 log files (will eventually grow to millions). Average
> log file is 850k. Largest log file (so far) is about 70MB.
>
> Problem: When I search for common terms, the query time goes from under 2-3
> seconds to about 60 seconds. TermVectors etc are enabled. When I disable
> highlighting, performance improves a lot, but is still slow for some queries
> (7 seconds). Thanks in advance for any ideas!
>
>
> -Peter
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
> 4GB RAM server
> % java -Xms2048M -Xmx3072M -jar start.jar
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
> schema.xml changes:
>
> <fieldType name="text_pl" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="0"/>
> </analyzer>
> </fieldType>
>
> ...
>
> <field name="body" type="text_pl" indexed="true" stored="true"
> multiValued="false" termVectors="true" termPositions="true"
> termOffsets="true" />
> <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
> <field name="version" type="string" indexed="true" stored="true"
> multiValued="false"/>
> <field name="device" type="string" indexed="true" stored="true"
> multiValued="false"/>
> <field name="filename" type="string" indexed="true" stored="true"
> multiValued="false"/>
> <field name="filesize" type="long" indexed="true" stored="true"
> multiValued="false"/>
> <field name="pversion" type="int" indexed="true" stored="true"
> multiValued="false"/>
> <field name="first2md5" type="string" indexed="false" stored="true"
> multiValued="false"/>
> <field name="ckey" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
> ...
>
> <dynamicField name="*" type="ignored" multiValued="true" />
> <defaultSearchField>body</defaultSearchField>
> <solrQueryParser defaultOperator="AND"/>
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
> solrconfig.xml changes:
>
> <maxFieldLength>2147483647</maxFieldLength>
> <ramBufferSizeMB>128</ramBufferSizeMB>
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
> The query:
>
> rowStr = "&rows=10"
> facet =
> "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version"
> fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey"
> termvectors = "&tv=true&qt=tvrh&tv.all=true"
> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400"
> regexv = "(?m)^.*\n.*\n.*$"
> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) +
> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647"
> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/,
> '').gsub(/([:~!<>="])/,'\\\\\1') + fuzzy + minLogSizeStr)
>
> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' :
> ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors + hl
> + hl_regex
>
> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' +
> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s
>
>
>
--
http://karussell.wordpress.com/