Did not follow the whole story but " post-query-value-filter" does exist in Solr. Have you tried searching for pretty much that expression. and maybe something about cost-based filter.
Regards, Alex On 26/05/2014 6:49 pm, "Per Steffensen" <st...@designware.dk> wrote: > Do not know if this is a special-case. I guess an AND-query where one side > hits 500-1000 and the other side hits billions is a special-case. But this > way of carrying out the query might also be an optimization in less uneven > cases. > It does not require that the "lots of hits"-part of the query is a > range-query, and it does not necessarily require that the field used in > this part is DocValue (you can go fetch the values from "slow" store). But > I guess it has to be a very uneven case if this approach should be faster > on a non-DocValue field. > > I think this can be generalized. I think of it as something similar as > being able to "hint" relational databases not to use an specific index. I > do not know that much about Solr/Lucene query-syntax, but I believe > "filter-queries" (fq) are kinda queries that will be AND'ed onto the real > query (q), and in order not to have to change the query-syntax too much > (adding hits or something), I guess a first step for a feature doing what I > am doing here, could be introduce something similar to "filter-queries" - > queries that will be carried out on the result of (q + fqs) but looking a > the values of the documents in that result instead of intersecting with > doc-sets found from index. Lets call it "post-query-value-filter"s (yes, we > can definitely come up with a better/shorter name) > > 1) q=no_dlng_doc_ind_sto:(<NO>) AND timestamp_dlng_doc_ind_sto:([<TIME_START> > TO <TIME_END>]) > 2) q=no_dlng_doc_ind_sto:(<NO>),fq=timestamp_dlng_doc_ind_sto:([<TIME_START> > TO <TIME_END>]) > 3) q=no_dlng_doc_ind_sto:(<NO>),post-query-value-filter= > timestamp_dlng_doc_ind_sto:([<TIME_START> TO <TIME_END>]) > > 1) and 2) both use index on both no_dlng_doc_ind_sto and > timestamp_dlng_doc_ind_sto. 3) uses only index on no_dlng_doc_ind_sto and > does the time-interval filter part by fetching values (using DocValue if > possible) for timestamp_dlng_doc_ind_sto for each of the docs found through > the no_dlng_doc_ind_sto-index to see if this doc should really be included. > > There are some things that I did not initially tell about actually wanting > to do a facet search etc. Well, here is the full story: > http://solrlucene.blogspot.dk/2014/05/performance-of-and- > queries-with-uneven.html > > Regards, Per Steffensen > > On 23/05/14 17:37, Toke Eskildsen wrote: > >> Per Steffensen [st...@designware.dk] wrote: >> >>> * It IS more efficient to just use the index for the >>> "no_dlng_doc_ind_sto"-part of the request to get doc-ids that match that >>> part and then fetch timestamp-doc-values for those doc-ids to filter out >>> the docs that does not match the "timestamp_dlng_doc_ind_sto"-part of >>> the query. >>> >> Thank you for the follow up. It sounds rather special-case though, with >> requirement of DocValues for the range-field. Do you think this can be >> generalized? >> >> - Toke Eskildsen >> >> >