Hi,

Profiling pointed me directly to the method i already suspected: 
ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and 
made sure the timers add up to the QueryComponent prepare time. After starting 
Solr there's one small part taking almost 100ms on a fast machine with lots of 
memory, fortunately this is only once. KStemmer and the loading of the 
KStemData and the ThaiWordFilter's init take the bulk of it.

      ExtendedSolrQueryParser up =
        new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
      up.addAlias(IMPOSSIBLE_FIELD_NAME,
                tiebreaker, queryFields);
      addAliasesFromRequest(up, tiebreaker);
      up.setPhraseSlop(qslop);     // slop for explicit user phrase queries
      up.setAllowLeadingWildcard(true);

After it's been running for some time two parts continue to take a lot of time, 
parsing the query

      if (parsedUserQuery == null) {
        sb = new StringBuilder();
        for (Clause clause : clauses) {

        ....

        if (parsedUserQuery instanceof BooleanQuery) {
          BooleanQuery t = new BooleanQuery();
          SolrPluginUtils.flattenBooleanQuery(t, (BooleanQuery)parsedUserQuery);
          SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
          parsedUserQuery = t;
        }
      }

and handing the phrase fields (pf, pf2, pf3):

      if (allPhraseFields.size() > 0) {
        // full phrase and shingles
        for (FieldParams phraseField: allPhraseFields) {
          Map<String,Float> pf = new HashMap<String,Float>(1);
          pf.put(phraseField.getField(),phraseField.getBoost());
          addShingledPhraseQueries(query, normalClauses, pf,
          phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
        }
      }

The problem is significant when having a lot of fields, the prepare time is 
usually higher than the process times of query, highlight and facet combined.


 
-----Original message-----
> From:Mikhail Khludnev <mkhlud...@griddynamics.com>
> Sent: Mon 19-Nov-2012 12:52
> To: solr-user@lucene.apache.org
> Subject: Re: Reduce QueryComponent prepare time
> 
> Markus,
> 
> It's hard to suggest anything until you provide a profiler snapshot which
> says what it spends time in prepare for. As far as I know in prepare it
> parses queries e.g. we have a really heavy query parsers, but I don't think
> it's really common.
> 
> 
> On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
> <markus.jel...@openindex.io>wrote:
> 
> > I'd also like to know which parts of the entire query constitute the
> > prepare time and if it would matter significantly if we extend the edismax
> > plugin and hardcode the parameters we pass into (reusable) objects.
> >
> > Thanks,
> > Markus
> >
> > -----Original message-----
> > > From:Markus Jelsma <markus.jel...@openindex.io>
> > > Sent: Fri 16-Nov-2012 15:57
> > > To: solr-user@lucene.apache.org
> > > Subject: Reduce QueryComponent prepare time
> > >
> > > Hi,
> > >
> > > We're seeing high prepare times for the QueryComponent, obviously due to
> > the vast amount of field and queries. It's common to have a prepare time of
> > 70-80ms while the process times drop significantly due to warmed searchers,
> > OS cache etc. The prepare time is a recurring issue and i'd hope if there
> > are people here that can share some thoughts or hints.
> > >
> > > We're using a recent check out on a 10 node test cluster with SSD's
> > (although this is no IO issue) and edismax on about a hundred different
> > fields, this includes phrase searches over most of those fields and
> > SpanFirst queries on about 25 fields.  We'd like to see how we can avoid
> > doing the same prepare procedure over and over again ;)
> > >
> > > Thanks,
> > > Markus
> > >
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
> 

Reply via email to