Hi Mikhail, Thanks for sharing your experiences. I'll look into the flexible query parser.
Markus -----Original message----- > From:Mikhail Khludnev <mkhlud...@griddynamics.com> > Sent: Tue 20-Nov-2012 19:53 > To: solr-user@lucene.apache.org > Subject: Re: Reduce QueryComponent prepare time > > Markus, > > It seems you faced the challenge of optimizing complex eDisMax code for > your particular usecase, which is not so common. I can not help with these > coding, just can share some experience: we have mind blowing queries too - > they spawns many fields and enumerate many phrase shingles. We have similar > contra intuitive hot spot - query parsing takes more than searching and > faceting. But for our case dictionaries lookup - i.e. terms substitution > and transformations are the main CPU consumption. We build our own query > parser with something like > http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html. > This way, when you represent core query structure as a DOM-like nodes > skeleton, and then transform them into particular queries instances, *might > be more performant* (and *might be not* for you) than current eDismax. > Nothing more useful from me. > > Bye. > > > On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma > <markus.jel...@openindex.io>wrote: > > > Hi, > > > > Profiling pointed me directly to the method i already suspected: > > ExtendedDismaxQParser.parse(). I added manual timers in parts of the method > > and made sure the timers add up to the QueryComponent prepare time. After > > starting Solr there's one small part taking almost 100ms on a fast machine > > with lots of memory, fortunately this is only once. KStemmer and the > > loading of the KStemData and the ThaiWordFilter's init take the bulk of it. > > > > ExtendedSolrQueryParser up = > > new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); > > up.addAlias(IMPOSSIBLE_FIELD_NAME, > > tiebreaker, queryFields); > > addAliasesFromRequest(up, tiebreaker); > > up.setPhraseSlop(qslop); // slop for explicit user phrase queries > > up.setAllowLeadingWildcard(true); > > > > After it's been running for some time two parts continue to take a lot of > > time, parsing the query > > > > if (parsedUserQuery == null) { > > sb = new StringBuilder(); > > for (Clause clause : clauses) { > > > > .... > > > > if (parsedUserQuery instanceof BooleanQuery) { > > BooleanQuery t = new BooleanQuery(); > > SolrPluginUtils.flattenBooleanQuery(t, > > (BooleanQuery)parsedUserQuery); > > SolrPluginUtils.setMinShouldMatch(t, minShouldMatch); > > parsedUserQuery = t; > > } > > } > > > > and handing the phrase fields (pf, pf2, pf3): > > > > if (allPhraseFields.size() > 0) { > > // full phrase and shingles > > for (FieldParams phraseField: allPhraseFields) { > > Map<String,Float> pf = new HashMap<String,Float>(1); > > pf.put(phraseField.getField(),phraseField.getBoost()); > > addShingledPhraseQueries(query, normalClauses, pf, > > phraseField.getWordGrams(),tiebreaker, phraseField.getSlop()); > > } > > } > > > > The problem is significant when having a lot of fields, the prepare time > > is usually higher than the process times of query, highlight and facet > > combined. > > > > > > > > -----Original message----- > > > From:Mikhail Khludnev <mkhlud...@griddynamics.com> > > > Sent: Mon 19-Nov-2012 12:52 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Reduce QueryComponent prepare time > > > > > > Markus, > > > > > > It's hard to suggest anything until you provide a profiler snapshot which > > > says what it spends time in prepare for. As far as I know in prepare it > > > parses queries e.g. we have a really heavy query parsers, but I don't > > think > > > it's really common. > > > > > > > > > On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma > > > <markus.jel...@openindex.io>wrote: > > > > > > > I'd also like to know which parts of the entire query constitute the > > > > prepare time and if it would matter significantly if we extend the > > edismax > > > > plugin and hardcode the parameters we pass into (reusable) objects. > > > > > > > > Thanks, > > > > Markus > > > > > > > > -----Original message----- > > > > > From:Markus Jelsma <markus.jel...@openindex.io> > > > > > Sent: Fri 16-Nov-2012 15:57 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Reduce QueryComponent prepare time > > > > > > > > > > Hi, > > > > > > > > > > We're seeing high prepare times for the QueryComponent, obviously > > due to > > > > the vast amount of field and queries. It's common to have a prepare > > time of > > > > 70-80ms while the process times drop significantly due to warmed > > searchers, > > > > OS cache etc. The prepare time is a recurring issue and i'd hope if > > there > > > > are people here that can share some thoughts or hints. > > > > > > > > > > We're using a recent check out on a 10 node test cluster with SSD's > > > > (although this is no IO issue) and edismax on about a hundred different > > > > fields, this includes phrase searches over most of those fields and > > > > SpanFirst queries on about 25 fields. We'd like to see how we can > > avoid > > > > doing the same prepare procedure over and over again ;) > > > > > > > > > > Thanks, > > > > > Markus > > > > > > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > Principal Engineer, > > > Grid Dynamics > > > > > > <http://www.griddynamics.com> > > > <mkhlud...@griddynamics.com> > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >