Re: Edismax mm and efficiency

Peter Keegan Wed, 10 Sep 2014 09:48:58 -0700

Sure. I created SOLR-6502. The tricky part was handling the behavior in a
sharded index. When the index is sharded. the response from each shard will
contain a parameter that indicates if the search results are from the
conjunction of all keywords (mm=100%), or from disjunction (mm=1). If the
shards contain both types, then only return the results from the
conjunction. This is necessary in order to get the same results independent
of the number of shards.


Peter

On Wed, Sep 10, 2014 at 11:07 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> We do that strict/loose query sequence, but on the client side with two
> requests. Would you consider contributing the QueryComponent?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 10, 2014, at 3:47 AM, Peter Keegan <peterlkee...@gmail.com> wrote:
>
> > I implemented a custom QueryComponent that issues the edismax query with
> > mm=100%, and if no results are found, it reissues the query with mm=1.
> This
> > doubled our query throughput (compared to mm=1 always), as we do some
> > expensive RankQuery processing. For your very long student queries,
> mm=100%
> > would obviously be too high, so you'd have to experiment.
> >
> > On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wun...@wunderwood.org>
> > wrote:
> >
> >> Great!
> >>
> >> We have some very long queries, where students paste entire homework
> >> problems. One of them was 1051 words. Many of them are over 100 words.
> This
> >> could help.
> >>
> >> In the Jira discussion, I saw some comments about handling the most
> sparse
> >> lists first. We did something like that in the Infoseek Ultra engine
> about
> >> twenty years ago. Short termlists (documents matching a term) were
> >> processed first, which kept the in-memory lists of matching docs small.
> It
> >> also allowed early short-circuiting for no-hits queries.
> >>
> >> What would be a high mm value, 75%?
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/
> >>
> >>
> >> On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com>
> >> wrote:
> >>
> >>> indeed https://issues.apache.org/jira/browse/LUCENE-4571
> >>> my feeling is it gives a significant gain in mm high values.
> >>>
> >>>
> >>>
> >>> On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <
> wun...@wunderwood.org>
> >>> wrote:
> >>>
> >>>> Are there any speed advantages to using “mm”? I can imagine pruning
> the
> >>>> set of matching documents early, which could help, but is that (or
> >>>> something else) done?
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> wun...@wunderwood.org
> >>>> http://observer.wunderwood.org/
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>> Principal Engineer,
> >>> Grid Dynamics
> >>>
> >>> <http://www.griddynamics.com>
> >>> <mkhlud...@griddynamics.com>
> >>
> >>
>
>

Re: Edismax mm and efficiency

Reply via email to