Did not follow the whole story but " post-query-value-filter" does exist in
Solr. Have you tried searching for pretty much that expression. and maybe
something about cost-based filter.

Regards,
    Alex
On 26/05/2014 6:49 pm, "Per Steffensen" <st...@designware.dk> wrote:

> Do not know if this is a special-case. I guess an AND-query where one side
> hits 500-1000 and the other side hits billions is a special-case. But this
> way of carrying out the query might also be an optimization in less uneven
> cases.
> It does not require that the "lots of hits"-part of the query is a
> range-query, and it does not necessarily require that the field used in
> this part is DocValue (you can go fetch the values from "slow" store). But
> I guess it has to be a very uneven case if this approach should be faster
> on a non-DocValue field.
>
> I think this can be generalized. I think of it as something similar as
> being able to "hint" relational databases not to use an specific index. I
> do not know that much about Solr/Lucene query-syntax, but I believe
> "filter-queries" (fq) are kinda queries that will be AND'ed onto the real
> query (q), and in order not to have to change the query-syntax too much
> (adding hits or something), I guess a first step for a feature doing what I
> am doing here, could be introduce something similar to "filter-queries" -
> queries that will be carried out on the result of (q + fqs) but looking a
> the values of the documents in that result instead of intersecting with
> doc-sets found from index. Lets call it "post-query-value-filter"s (yes, we
> can definitely come up with a better/shorter name)
>
> 1) q=no_dlng_doc_ind_sto:(<NO>) AND timestamp_dlng_doc_ind_sto:([<TIME_START>
> TO <TIME_END>])
> 2) q=no_dlng_doc_ind_sto:(<NO>),fq=timestamp_dlng_doc_ind_sto:([<TIME_START>
> TO <TIME_END>])
> 3) q=no_dlng_doc_ind_sto:(<NO>),post-query-value-filter=
> timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>])
>
> 1) and 2) both use index on both no_dlng_doc_ind_sto and
> timestamp_dlng_doc_ind_sto. 3) uses only index on no_dlng_doc_ind_sto and
> does the time-interval filter part by fetching values (using DocValue if
> possible) for timestamp_dlng_doc_ind_sto for each of the docs found through
> the no_dlng_doc_ind_sto-index to see if this doc should really be included.
>
> There are some things that I did not initially tell about actually wanting
> to do a facet search etc. Well, here is the full story:
> http://solrlucene.blogspot.dk/2014/05/performance-of-and-
> queries-with-uneven.html
>
> Regards, Per Steffensen
>
> On 23/05/14 17:37, Toke Eskildsen wrote:
>
>> Per Steffensen [st...@designware.dk] wrote:
>>
>>> * It IS more efficient to just use the index for the
>>> "no_dlng_doc_ind_sto"-part of the request to get doc-ids that match that
>>> part and then fetch timestamp-doc-values for those doc-ids to filter out
>>> the docs that does not match the "timestamp_dlng_doc_ind_sto"-part of
>>> the query.
>>>
>> Thank you for the follow up. It sounds rather special-case though, with
>> requirement of DocValues for the range-field. Do you think this can be
>> generalized?
>>
>> - Toke Eskildsen
>>
>>
>

Reply via email to