Do not know if this is a special-case. I guess an AND-query where one
side hits 500-1000 and the other side hits billions is a special-case.
But this way of carrying out the query might also be an optimization in
less uneven cases.
It does not require that the "lots of hits"-part of the query is a
range-query, and it does not necessarily require that the field used in
this part is DocValue (you can go fetch the values from "slow" store).
But I guess it has to be a very uneven case if this approach should be
faster on a non-DocValue field.
I think this can be generalized. I think of it as something similar as
being able to "hint" relational databases not to use an specific index.
I do not know that much about Solr/Lucene query-syntax, but I believe
"filter-queries" (fq) are kinda queries that will be AND'ed onto the
real query (q), and in order not to have to change the query-syntax too
much (adding hits or something), I guess a first step for a feature
doing what I am doing here, could be introduce something similar to
"filter-queries" - queries that will be carried out on the result of (q
+ fqs) but looking a the values of the documents in that result instead
of intersecting with doc-sets found from index. Lets call it
"post-query-value-filter"s (yes, we can definitely come up with a
better/shorter name)
1) q=no_dlng_doc_ind_sto:(<NO>) AND
timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>])
2)
q=no_dlng_doc_ind_sto:(<NO>),fq=timestamp_dlng_doc_ind_sto:([<TIME_START> TO
<TIME_END>])
3)
q=no_dlng_doc_ind_sto:(<NO>),post-query-value-filter=timestamp_dlng_doc_ind_sto:([<TIME_START>
TO <TIME_END>])
1) and 2) both use index on both no_dlng_doc_ind_sto and
timestamp_dlng_doc_ind_sto. 3) uses only index on no_dlng_doc_ind_sto
and does the time-interval filter part by fetching values (using
DocValue if possible) for timestamp_dlng_doc_ind_sto for each of the
docs found through the no_dlng_doc_ind_sto-index to see if this doc
should really be included.
There are some things that I did not initially tell about actually
wanting to do a facet search etc. Well, here is the full story:
http://solrlucene.blogspot.dk/2014/05/performance-of-and-queries-with-uneven.html
Regards, Per Steffensen
On 23/05/14 17:37, Toke Eskildsen wrote:
Per Steffensen [st...@designware.dk] wrote:
* It IS more efficient to just use the index for the
"no_dlng_doc_ind_sto"-part of the request to get doc-ids that match that
part and then fetch timestamp-doc-values for those doc-ids to filter out
the docs that does not match the "timestamp_dlng_doc_ind_sto"-part of
the query.
Thank you for the follow up. It sounds rather special-case though, with
requirement of DocValues for the range-field. Do you think this can be
generalized?
- Toke Eskildsen