Markus,

It seems you faced the challenge of optimizing complex eDisMax code for
your particular usecase, which is not so common. I can not help with these
coding, just can share some experience: we have mind blowing queries too -
they spawns many fields and enumerate many phrase shingles. We have similar
contra intuitive hot spot - query parsing takes more than searching and
faceting. But for our case dictionaries lookup - i.e. terms substitution
and transformations are the main CPU consumption. We build our own query
parser with something like
http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html.
This way, when you represent core query structure as a DOM-like nodes
skeleton, and then transform them into particular queries instances, *might
be more performant* (and *might be not* for you) than current eDismax.
Nothing more useful from me.

Bye.


On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma
<markus.jel...@openindex.io>wrote:

> Hi,
>
> Profiling pointed me directly to the method i already suspected:
> ExtendedDismaxQParser.parse(). I added manual timers in parts of the method
> and made sure the timers add up to the QueryComponent prepare time. After
> starting Solr there's one small part taking almost 100ms on a fast machine
> with lots of memory, fortunately this is only once. KStemmer and the
> loading of the KStemData and the ThaiWordFilter's init take the bulk of it.
>
>       ExtendedSolrQueryParser up =
>         new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
>       up.addAlias(IMPOSSIBLE_FIELD_NAME,
>                 tiebreaker, queryFields);
>       addAliasesFromRequest(up, tiebreaker);
>       up.setPhraseSlop(qslop);     // slop for explicit user phrase queries
>       up.setAllowLeadingWildcard(true);
>
> After it's been running for some time two parts continue to take a lot of
> time, parsing the query
>
>       if (parsedUserQuery == null) {
>         sb = new StringBuilder();
>         for (Clause clause : clauses) {
>
>         ....
>
>         if (parsedUserQuery instanceof BooleanQuery) {
>           BooleanQuery t = new BooleanQuery();
>           SolrPluginUtils.flattenBooleanQuery(t,
> (BooleanQuery)parsedUserQuery);
>           SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
>           parsedUserQuery = t;
>         }
>       }
>
> and handing the phrase fields (pf, pf2, pf3):
>
>       if (allPhraseFields.size() > 0) {
>         // full phrase and shingles
>         for (FieldParams phraseField: allPhraseFields) {
>           Map<String,Float> pf = new HashMap<String,Float>(1);
>           pf.put(phraseField.getField(),phraseField.getBoost());
>           addShingledPhraseQueries(query, normalClauses, pf,
>           phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
>         }
>       }
>
> The problem is significant when having a lot of fields, the prepare time
> is usually higher than the process times of query, highlight and facet
> combined.
>
>
>
> -----Original message-----
> > From:Mikhail Khludnev <mkhlud...@griddynamics.com>
> > Sent: Mon 19-Nov-2012 12:52
> > To: solr-user@lucene.apache.org
> > Subject: Re: Reduce QueryComponent prepare time
> >
> > Markus,
> >
> > It's hard to suggest anything until you provide a profiler snapshot which
> > says what it spends time in prepare for. As far as I know in prepare it
> > parses queries e.g. we have a really heavy query parsers, but I don't
> think
> > it's really common.
> >
> >
> > On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
> > <markus.jel...@openindex.io>wrote:
> >
> > > I'd also like to know which parts of the entire query constitute the
> > > prepare time and if it would matter significantly if we extend the
> edismax
> > > plugin and hardcode the parameters we pass into (reusable) objects.
> > >
> > > Thanks,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Markus Jelsma <markus.jel...@openindex.io>
> > > > Sent: Fri 16-Nov-2012 15:57
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Reduce QueryComponent prepare time
> > > >
> > > > Hi,
> > > >
> > > > We're seeing high prepare times for the QueryComponent, obviously
> due to
> > > the vast amount of field and queries. It's common to have a prepare
> time of
> > > 70-80ms while the process times drop significantly due to warmed
> searchers,
> > > OS cache etc. The prepare time is a recurring issue and i'd hope if
> there
> > > are people here that can share some thoughts or hints.
> > > >
> > > > We're using a recent check out on a 10 node test cluster with SSD's
> > > (although this is no IO issue) and edismax on about a hundred different
> > > fields, this includes phrase searches over most of those fields and
> > > SpanFirst queries on about 25 fields.  We'd like to see how we can
> avoid
> > > doing the same prepare procedure over and over again ;)
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mkhlud...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to