I do store term vector: <field name="body" type="text_pl" indexed="true" stored="true" multiValued="false" termVectors="true" termPositions="true" termOffsets="true" />
-Pete On Jul 30, 2010, at 7:30 AM, Li Li wrote: > hightlight's time is mainly spent on getting the field which you want > to highlight and tokenize this field(If you don't store term vector) . > you can check what's wrong, > > 2010/7/30 Peter Spam <ps...@mac.com>: >> If I don't do highlighting, it's really fast. Optimize has no effect. >> >> -Peter >> >> On Jul 29, 2010, at 11:54 AM, dc tech wrote: >> >>> Are you storing the entire log file text in SOLR? That's almost 3gb of >>> text that you are storing in the SOLR. Try to >>> 1) Is this first time performance or on repaat queries with the same fields? >>> 2) Optimze the index and test performance again >>> 3) index without storing the text and see what the performance looks like. >>> >>> >>> On 7/29/10, Peter Spam <ps...@mac.com> wrote: >>>> Any ideas? I've got 5000 documents with an average size of 850k each, and >>>> it sometimes takes 2 minutes for a query to come back when highlighting is >>>> turned on! Help! >>>> >>>> >>>> -Pete >>>> >>>> On Jul 21, 2010, at 2:41 PM, Peter Spam wrote: >>>> >>>>> From the mailing list archive, Koji wrote: >>>>> >>>>>> 1. Provide another field for highlighting and use copyField to copy >>>>>> plainText to the highlighting field. >>>>> >>>>> and Lance wrote: >>>>> http://www.mail-archive.com/solr-user@lucene.apache.org/msg35548.html >>>>> >>>>>> If you want to highlight field X, doing the >>>>>> termOffsets/termPositions/termVectors will make highlighting that field >>>>>> faster. You should make a separate field and apply these options to that >>>>>> field. >>>>>> >>>>>> Now: doing a copyfield adds a "value" to a multiValued field. For a text >>>>>> field, you get a multi-valued text field. You should only copy one value >>>>>> to the highlighted field, so just copyField the document to your special >>>>>> field. To enforce this, I would add multiValued="false" to that field, >>>>>> just to avoid mistakes. >>>>>> >>>>>> So, all_text should be indexed without the term* attributes, and should >>>>>> not be stored. Then your document stored in a separate field that you use >>>>>> for highlighting and has the term* attributes. >>>>> >>>>> I've been experimenting with this, and here's what I've tried: >>>>> >>>>> <field name="body" type="text_pl" indexed="true" stored="false" >>>>> multiValued="true" termVectors="true" termPositions="true" termOff >>>>> sets="true" /> >>>>> <field name="body_all" type="text_pl" indexed="false" stored="true" >>>>> multiValued="true" /> >>>>> <copyField source="body" dest="body_all"/> >>>>> >>>>> ... but it's still very slow (10+ seconds). Why is it better to have two >>>>> fields (one indexed but not stored, and the other not indexed but stored) >>>>> rather than just one field that's both indexed and stored? >>>>> >>>>> >>>>> From the Perf wiki page http://wiki.apache.org/solr/SolrPerformanceFactors >>>>> >>>>>> If you aren't always using all the stored fields, then enabling lazy >>>>>> field loading can be a huge boon, especially if compressed fields are >>>>>> used. >>>>> >>>>> What does this mean? How do you load a field lazily? >>>>> >>>>> Thanks for your time, guys - this has started to become frustrating, since >>>>> it works so well, but is very slow! >>>>> >>>>> >>>>> -Pete >>>>> >>>>> On Jul 20, 2010, at 5:36 PM, Peter Spam wrote: >>>>> >>>>>> Data set: About 4,000 log files (will eventually grow to millions). >>>>>> Average log file is 850k. Largest log file (so far) is about 70MB. >>>>>> >>>>>> Problem: When I search for common terms, the query time goes from under >>>>>> 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I >>>>>> disable highlighting, performance improves a lot, but is still slow for >>>>>> some queries (7 seconds). Thanks in advance for any ideas! >>>>>> >>>>>> >>>>>> -Peter >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> 4GB RAM server >>>>>> % java -Xms2048M -Xmx3072M -jar start.jar >>>>>> >>>>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> schema.xml changes: >>>>>> >>>>>> <fieldType name="text_pl" class="solr.TextField"> >>>>>> <analyzer> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" >>>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0" >>>>>> catenateAll="0" splitOnCaseChange="0"/> >>>>>> </analyzer> >>>>>> </fieldType> >>>>>> >>>>>> ... >>>>>> >>>>>> <field name="body" type="text_pl" indexed="true" stored="true" >>>>>> multiValued="false" termVectors="true" termPositions="true" >>>>>> termOffsets="true" /> >>>>>> <field name="timestamp" type="date" indexed="true" stored="true" >>>>>> default="NOW" multiValued="false"/> >>>>>> <field name="version" type="string" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="device" type="string" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="filename" type="string" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="filesize" type="long" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="pversion" type="int" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="first2md5" type="string" indexed="false" stored="true" >>>>>> multiValued="false"/> >>>>>> <field name="ckey" type="string" indexed="true" stored="true" >>>>>> multiValued="false"/> >>>>>> >>>>>> ... >>>>>> >>>>>> <dynamicField name="*" type="ignored" multiValued="true" /> >>>>>> <defaultSearchField>body</defaultSearchField> >>>>>> <solrQueryParser defaultOperator="AND"/> >>>>>> >>>>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> solrconfig.xml changes: >>>>>> >>>>>> <maxFieldLength>2147483647</maxFieldLength> >>>>>> <ramBufferSizeMB>128</ramBufferSizeMB> >>>>>> >>>>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> The query: >>>>>> >>>>>> rowStr = "&rows=10" >>>>>> facet = >>>>>> "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version" >>>>>> fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey" >>>>>> termvectors = "&tv=true&qt=tvrh&tv.all=true" >>>>>> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" >>>>>> regexv = "(?m)^.*\n.*\n.*$" >>>>>> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + >>>>>> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" >>>>>> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, >>>>>> '').gsub(/([:~!<>="])/,'\\\\\1') + fuzzy + minLogSizeStr) >>>>>> >>>>>> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' >>>>>> : ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors >>>>>> + hl + hl_regex >>>>>> >>>>>> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + >>>>>> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >>>>>> >>>>> >>>> >>>> >>> >>> -- >>> Sent from my mobile device >> >>