On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote: > Wait- how much text are you highlighting? You say these logfiles are X > big- how big are the actual documents you are storing?
I want it to be like google - I put the entire (sometimes 60MB) doc in a field, and then just highlight 2-4 lines of it. Thanks, Peter > On Fri, Jul 30, 2010 at 1:16 PM, Peter Karich <peat...@yahoo.de> wrote: >> Hi Peter :-), >> >> did you already try other values for >> >> hl.maxAnalyzedChars=2147483647 >> >> ? Also regular expression highlighting is more expensive, I think. >> What does the 'fuzzy' variable mean? If you use this to query via >> "~someTerm" instead "someTerm" >> then you should try the trunk of solr which is a lot faster for fuzzy or >> other wildcard search. >> >> Regards, >> Peter. >> >>> Data set: About 4,000 log files (will eventually grow to millions). >>> Average log file is 850k. Largest log file (so far) is about 70MB. >>> >>> Problem: When I search for common terms, the query time goes from under 2-3 >>> seconds to about 60 seconds. TermVectors etc are enabled. When I disable >>> highlighting, performance improves a lot, but is still slow for some >>> queries (7 seconds). Thanks in advance for any ideas! >>> >>> >>> -Peter >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------- >>> >>> 4GB RAM server >>> % java -Xms2048M -Xmx3072M -jar start.jar >>> >>> ------------------------------------------------------------------------------------------------------------------------------------- >>> >>> schema.xml changes: >>> >>> <fieldType name="text_pl" class="solr.TextField"> >>> <analyzer> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" >>> generateNumberParts="0" catenateWords="0" catenateNumbers="0" >>> catenateAll="0" splitOnCaseChange="0"/> >>> </analyzer> >>> </fieldType> >>> >>> ... >>> >>> <field name="body" type="text_pl" indexed="true" stored="true" >>> multiValued="false" termVectors="true" termPositions="true" >>> termOffsets="true" /> >>> <field name="timestamp" type="date" indexed="true" stored="true" >>> default="NOW" multiValued="false"/> >>> <field name="version" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>> <field name="device" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>> <field name="filename" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>> <field name="filesize" type="long" indexed="true" stored="true" >>> multiValued="false"/> >>> <field name="pversion" type="int" indexed="true" stored="true" >>> multiValued="false"/> >>> <field name="first2md5" type="string" indexed="false" stored="true" >>> multiValued="false"/> >>> <field name="ckey" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>> >>> ... >>> >>> <dynamicField name="*" type="ignored" multiValued="true" /> >>> <defaultSearchField>body</defaultSearchField> >>> <solrQueryParser defaultOperator="AND"/> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------- >>> >>> solrconfig.xml changes: >>> >>> <maxFieldLength>2147483647</maxFieldLength> >>> <ramBufferSizeMB>128</ramBufferSizeMB> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------- >>> >>> The query: >>> >>> rowStr = "&rows=10" >>> facet = >>> "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version" >>> fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey" >>> termvectors = "&tv=true&qt=tvrh&tv.all=true" >>> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" >>> regexv = "(?m)^.*\n.*\n.*$" >>> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + >>> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" >>> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, >>> '').gsub(/([:~!<>="])/,'\\\\\1') + fuzzy + minLogSizeStr) >>> >>> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' : >>> ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors + >>> hl + hl_regex >>> >>> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + >>> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >>> >>> >>> >> >> >> -- >> http://karussell.wordpress.com/ >> >> > > > > -- > Lance Norskog > goks...@gmail.com