--- On Tue, 2/22/11, Jon Drukman <j...@cluttered.com> wrote: > From: Jon Drukman <j...@cluttered.com> > Subject: Sorting - bad performance > To: solr-user@lucene.apache.org > Date: Tuesday, February 22, 2011, 9:44 PM > The performance factors wiki says: > "If you do a lot of field based sorting, it is advantageous > to add explicitly > warming queries to the "newSearcher" and "firstSearcher" > event listeners in your > solrconfig which sort on those fields, so the FieldCache is > populated prior to > any queries being executed by your users." > > I've got an index with 24+ million docs of forum posts from > users. I want to be > able to get a given user's posts sorted by date. It's > taking 20 seconds right > now. What would I put in the newSearch/firstSearcher > to make that quicker? Is > there any other general approach I can use to speed up > sorting? > > The schema looks like > > <fields> > <field name="type_id" type="string" > indexed="true" stored="true" > required="true" /> > <field name="subhead" type="text" > indexed="true" stored="true"/> > <field name="post_date" type="date" > indexed="true" stored="true" /> > <field name="author" type="cistring" > indexed="true" stored="true" /> > <field name="parent_author" > type="cistring" indexed="true" stored="true" /> > </fields> > > cistring is a case-insensitive string type i created: > > <fieldType name="cistring" > class="solr.StrField" sortMissingLast="true" > omitNorms="true"> > <analyzer type="index"> > > <tokenizer class="solr.LowerCaseTokenizerFactory"/> > </analyzer> > <analyzer type="query"> > > <tokenizer class="solr.LowerCaseTokenizerFactory"/> > </analyzer> > </fieldType> >
It is not directly related with sorting performance but this will reduce number of unique terms: If you define a type with class="solr.StrField", then analyzer definition is ignored. Although analysis.jsp displays as if it is not ignored. If you want to activate tokenizer etc, you need to use class="solr.TextField". And about your author fields, depending of your domain you may want to use KeywordTokenizerFactory instead of LowerCaseTokenizerFactory.