Re: Evaluate function only on subset of documents

Costi Muraru Tue, 24 Jun 2014 15:58:59 -0700

Hi Chris,

Thanks for your patience, I've now got a better image on how things work.
I don't believe however that the two queries (the one with the post filter
and the one without one) are equivalent.

Suppose out of the whole document set:
XXX returns documents 1,2,3.
AAA returns documents  6,7,8.
{!frange}customfunction returns documents 7,8.

Running this query:
XXX OR AAA AND {!frange ...}
Matched documents are:
(1,2,3) OR (6,7,8) AND (7,8) = (1,2,3) OR (7,8) = 1,2,3,7,8

With the post filter:
q=XXX OR AAA & fq={!frange cost=150 cache=false ...}
Matched documents are:
(1,2,3) OR (6,7,8) = (1,2,3,6,7,8) with post filter (7,8) = (7,8)

I was hoping that the evaluation process would be short circuit.
Document set: 1,2,3,4,5,6,7,8

Document id 1:
Does it match XXX? Yes. Document matches query. Skip the second clause (AAA
AND {!frange ...}) and evaluate next doc.
Document id 2:
Does it match XXX? Yes. Document matches query. Skip second clause and
evaluate next doc.
Document id 3:
Does it match XXX? Yes. Document matches query. Skip second clause and
evaluate next doc.

Document id 4:
Does it match XXX? No.
Does it match AAA? No. Document does not match query. Skip frange and
evaluate next doc.

Document id 5:
Does it match XXX? No.
Does it match AAA? No. Document does not match query. Skip frange and
evaluate next doc.

Document id 6:
Does it match XXX? No.
Does it match AAA? Yes.
Does it match frange? No.  Document does not match query. [Only here the
custom function would be evaluated first.]

Document id 7:
Does it match XXX? No.
Does it match AAA? Yes.
Does it match frange? Yes.  Document matches query.

Document id 8:
Does it match XXX? No.
Does it match AAA? Yes.
Does it match frange? Yes.  Document matches query.

Returned documents: 1,2,3,7,8.

So with this logic the custom function would be evaluated on documents
6,7,8 rather than on the whole set to see the smallest doc index, like
you've described in your last email.

I hope I'm not rambling. :-)
Does it make sense?

Costi

On Tue, Jun 24, 2014 at 7:26 PM, Chris Hostetter <hossman_luc...@fucit.org>
 wrote:

>
> : Let's take this query sample:
> : XXX OR AAA AND {!frange ...}
> :
> : For my use case:
> : AAA returns a subset of 100k documents.
> : frange returns 5k documents, all part of these 100k documents.
> :
> : Therefore, frange skips the most documents. From what you are saying,
> : frange is going to be applied on all documents (since it skips the most
> : documents) and AAA is going to be applied on the subset. This is kind of
> : what I've originally noticed. My goal is to have this in reverse order,
>
> That's not exactly it ... there's no way for the query to know in advance
> how many documents it matches -- what BooleanQuery asks each clause is
> "looking at the index, tell me the (internal) lucene docid of the first do
> you match.  it then looks at the lowest matching docid of each clause, and
> the "Occur" property of the clause (MUST, MUST_NOT, SHOULD) to be able to
> tell if/when it can say things like "clause AAA is mandatory but the
> lowest id it matches is doc# 8675 -- so it doesn't mater that clause XXX's
> lowest match is doc# 10 or that clause {!frange}'s lowest matche is doc#
> 100"
>
> it can then ask XXX and {!frange} to both "skip" ahead, and find lowest
> docid they each match that is no less then 8675, etc...
>
> from the perspective of {!frange} in particular, this means that on the
> first call it will evaluate itself against docid #0, #1, #2, etc... untill
> it finds a match.  and on the secod call it will evaluate itself against
> docid #8675, 8676, etc... until it finds a match...
>
> : since frange is much more expensive than AAA.
> : I was hoping to do so by specifying the cost, saying that "Hey, frange
> has
>
> There is no support for specifying cost on individual clauses instead of a
> BooleanQuery.
>
> But i really want to re-iterate, that even with the example you posted
> above you *still* don't need to nest your {!frange} instead of a boolean
> query -- what you have is this:
>
>         XXX OR AAA AND {!frange ...}
>
> in which the {!frange ...} clause is completely mandatory -- so my
> previous point #2 still applies...
>
> : > 2) based on the example you give, what you're trying to do here doesn't
> : > really depend on using "SHOULD" (ie: OR) type logic against the frange:
> : > the only disjunction you have is in a sub-query of a top level
> : > conjunction (e: all required) ... the frange itself is still mandatory.
> : >
> : > so you could still use it as a non-cached postfilter just like in your
> : > previous example:
>
>   q=XXX OR AAA & fq={!frange cost=150 cache=false ...}
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Evaluate function only on subset of documents

Reply via email to