Do not know if this is a special-case. I guess an AND-query where one side hits 500-1000 and the other side hits billions is a special-case. But this way of carrying out the query might also be an optimization in less uneven cases. It does not require that the "lots of hits"-part of the query is a range-query, and it does not necessarily require that the field used in this part is DocValue (you can go fetch the values from "slow" store). But I guess it has to be a very uneven case if this approach should be faster on a non-DocValue field.

I think this can be generalized. I think of it as something similar as being able to "hint" relational databases not to use an specific index. I do not know that much about Solr/Lucene query-syntax, but I believe "filter-queries" (fq) are kinda queries that will be AND'ed onto the real query (q), and in order not to have to change the query-syntax too much (adding hits or something), I guess a first step for a feature doing what I am doing here, could be introduce something similar to "filter-queries" - queries that will be carried out on the result of (q + fqs) but looking a the values of the documents in that result instead of intersecting with doc-sets found from index. Lets call it "post-query-value-filter"s (yes, we can definitely come up with a better/shorter name)

1) q=no_dlng_doc_ind_sto:(<NO>) AND timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>]) 2) q=no_dlng_doc_ind_sto:(<NO>),fq=timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>]) 3) q=no_dlng_doc_ind_sto:(<NO>),post-query-value-filter=timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>])

1) and 2) both use index on both no_dlng_doc_ind_sto and timestamp_dlng_doc_ind_sto. 3) uses only index on no_dlng_doc_ind_sto and does the time-interval filter part by fetching values (using DocValue if possible) for timestamp_dlng_doc_ind_sto for each of the docs found through the no_dlng_doc_ind_sto-index to see if this doc should really be included.

There are some things that I did not initially tell about actually wanting to do a facet search etc. Well, here is the full story: http://solrlucene.blogspot.dk/2014/05/performance-of-and-queries-with-uneven.html

Regards, Per Steffensen

On 23/05/14 17:37, Toke Eskildsen wrote:
Per Steffensen [st...@designware.dk] wrote:
* It IS more efficient to just use the index for the
"no_dlng_doc_ind_sto"-part of the request to get doc-ids that match that
part and then fetch timestamp-doc-values for those doc-ids to filter out
the docs that does not match the "timestamp_dlng_doc_ind_sto"-part of
the query.
Thank you for the follow up. It sounds rather special-case though, with 
requirement of DocValues for the range-field. Do you think this can be 
generalized?

- Toke Eskildsen


Reply via email to