Markus, It seems you faced the challenge of optimizing complex eDisMax code for your particular usecase, which is not so common. I can not help with these coding, just can share some experience: we have mind blowing queries too - they spawns many fields and enumerate many phrase shingles. We have similar contra intuitive hot spot - query parsing takes more than searching and faceting. But for our case dictionaries lookup - i.e. terms substitution and transformations are the main CPU consumption. We build our own query parser with something like http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html. This way, when you represent core query structure as a DOM-like nodes skeleton, and then transform them into particular queries instances, *might be more performant* (and *might be not* for you) than current eDismax. Nothing more useful from me.
Bye. On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma <markus.jel...@openindex.io>wrote: > Hi, > > Profiling pointed me directly to the method i already suspected: > ExtendedDismaxQParser.parse(). I added manual timers in parts of the method > and made sure the timers add up to the QueryComponent prepare time. After > starting Solr there's one small part taking almost 100ms on a fast machine > with lots of memory, fortunately this is only once. KStemmer and the > loading of the KStemData and the ThaiWordFilter's init take the bulk of it. > > ExtendedSolrQueryParser up = > new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); > up.addAlias(IMPOSSIBLE_FIELD_NAME, > tiebreaker, queryFields); > addAliasesFromRequest(up, tiebreaker); > up.setPhraseSlop(qslop); // slop for explicit user phrase queries > up.setAllowLeadingWildcard(true); > > After it's been running for some time two parts continue to take a lot of > time, parsing the query > > if (parsedUserQuery == null) { > sb = new StringBuilder(); > for (Clause clause : clauses) { > > .... > > if (parsedUserQuery instanceof BooleanQuery) { > BooleanQuery t = new BooleanQuery(); > SolrPluginUtils.flattenBooleanQuery(t, > (BooleanQuery)parsedUserQuery); > SolrPluginUtils.setMinShouldMatch(t, minShouldMatch); > parsedUserQuery = t; > } > } > > and handing the phrase fields (pf, pf2, pf3): > > if (allPhraseFields.size() > 0) { > // full phrase and shingles > for (FieldParams phraseField: allPhraseFields) { > Map<String,Float> pf = new HashMap<String,Float>(1); > pf.put(phraseField.getField(),phraseField.getBoost()); > addShingledPhraseQueries(query, normalClauses, pf, > phraseField.getWordGrams(),tiebreaker, phraseField.getSlop()); > } > } > > The problem is significant when having a lot of fields, the prepare time > is usually higher than the process times of query, highlight and facet > combined. > > > > -----Original message----- > > From:Mikhail Khludnev <mkhlud...@griddynamics.com> > > Sent: Mon 19-Nov-2012 12:52 > > To: solr-user@lucene.apache.org > > Subject: Re: Reduce QueryComponent prepare time > > > > Markus, > > > > It's hard to suggest anything until you provide a profiler snapshot which > > says what it spends time in prepare for. As far as I know in prepare it > > parses queries e.g. we have a really heavy query parsers, but I don't > think > > it's really common. > > > > > > On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma > > <markus.jel...@openindex.io>wrote: > > > > > I'd also like to know which parts of the entire query constitute the > > > prepare time and if it would matter significantly if we extend the > edismax > > > plugin and hardcode the parameters we pass into (reusable) objects. > > > > > > Thanks, > > > Markus > > > > > > -----Original message----- > > > > From:Markus Jelsma <markus.jel...@openindex.io> > > > > Sent: Fri 16-Nov-2012 15:57 > > > > To: solr-user@lucene.apache.org > > > > Subject: Reduce QueryComponent prepare time > > > > > > > > Hi, > > > > > > > > We're seeing high prepare times for the QueryComponent, obviously > due to > > > the vast amount of field and queries. It's common to have a prepare > time of > > > 70-80ms while the process times drop significantly due to warmed > searchers, > > > OS cache etc. The prepare time is a recurring issue and i'd hope if > there > > > are people here that can share some thoughts or hints. > > > > > > > > We're using a recent check out on a 10 node test cluster with SSD's > > > (although this is no IO issue) and edismax on about a hundred different > > > fields, this includes phrase searches over most of those fields and > > > SpanFirst queries on about 25 fields. We'd like to see how we can > avoid > > > doing the same prepare procedure over and over again ;) > > > > > > > > Thanks, > > > > Markus > > > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > <http://www.griddynamics.com> > > <mkhlud...@griddynamics.com> > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>