>From the mailing list archive, Koji wrote: > 1. Provide another field for highlighting and use copyField to copy plainText > to the highlighting field.
and Lance wrote: http://www.mail-archive.com/[email protected]/msg35548.html > If you want to highlight field X, doing the > termOffsets/termPositions/termVectors will make highlighting that field > faster. You should make a separate field and apply these options to that > field. > > Now: doing a copyfield adds a "value" to a multiValued field. For a text > field, you get a multi-valued text field. You should only copy one value to > the highlighted field, so just copyField the document to your special field. > To enforce this, I would add multiValued="false" to that field, just to avoid > mistakes. > > So, all_text should be indexed without the term* attributes, and should not > be stored. Then your document stored in a separate field that you use for > highlighting and has the term* attributes. I've been experimenting with this, and here's what I've tried: <field name="body" type="text_pl" indexed="true" stored="false" multiValued="true" termVectors="true" termPositions="true" termOff sets="true" /> <field name="body_all" type="text_pl" indexed="false" stored="true" multiValued="true" /> <copyField source="body" dest="body_all"/> ... but it's still very slow (10+ seconds). Why is it better to have two fields (one indexed but not stored, and the other not indexed but stored) rather than just one field that's both indexed and stored? >From the Perf wiki page http://wiki.apache.org/solr/SolrPerformanceFactors > If you aren't always using all the stored fields, then enabling lazy field > loading can be a huge boon, especially if compressed fields are used. What does this mean? How do you load a field lazily? Thanks for your time, guys - this has started to become frustrating, since it works so well, but is very slow! -Pete On Jul 20, 2010, at 5:36 PM, Peter Spam wrote: > Data set: About 4,000 log files (will eventually grow to millions). Average > log file is 850k. Largest log file (so far) is about 70MB. > > Problem: When I search for common terms, the query time goes from under 2-3 > seconds to about 60 seconds. TermVectors etc are enabled. When I disable > highlighting, performance improves a lot, but is still slow for some queries > (7 seconds). Thanks in advance for any ideas! > > > -Peter > > > ------------------------------------------------------------------------------------------------------------------------------------- > > 4GB RAM server > % java -Xms2048M -Xmx3072M -jar start.jar > > ------------------------------------------------------------------------------------------------------------------------------------- > > schema.xml changes: > > <fieldType name="text_pl" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" > generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="0"/> > </analyzer> > </fieldType> > > ... > > <field name="body" type="text_pl" indexed="true" stored="true" > multiValued="false" termVectors="true" termPositions="true" > termOffsets="true" /> > <field name="timestamp" type="date" indexed="true" stored="true" > default="NOW" multiValued="false"/> > <field name="version" type="string" indexed="true" stored="true" > multiValued="false"/> > <field name="device" type="string" indexed="true" stored="true" > multiValued="false"/> > <field name="filename" type="string" indexed="true" stored="true" > multiValued="false"/> > <field name="filesize" type="long" indexed="true" stored="true" > multiValued="false"/> > <field name="pversion" type="int" indexed="true" stored="true" > multiValued="false"/> > <field name="first2md5" type="string" indexed="false" stored="true" > multiValued="false"/> > <field name="ckey" type="string" indexed="true" stored="true" > multiValued="false"/> > > ... > > <dynamicField name="*" type="ignored" multiValued="true" /> > <defaultSearchField>body</defaultSearchField> > <solrQueryParser defaultOperator="AND"/> > > ------------------------------------------------------------------------------------------------------------------------------------- > > solrconfig.xml changes: > > <maxFieldLength>2147483647</maxFieldLength> > <ramBufferSizeMB>128</ramBufferSizeMB> > > ------------------------------------------------------------------------------------------------------------------------------------- > > The query: > > rowStr = "&rows=10" > facet = > "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version" > fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey" > termvectors = "&tv=true&qt=tvrh&tv.all=true" > hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" > regexv = "(?m)^.*\n.*\n.*$" > hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + > "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" > justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, > '').gsub(/([:~!<>="])/,'\\\\\1') + fuzzy + minLogSizeStr) > > thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' : > ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors + hl > + hl_regex > > baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + > p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >
