Thanks Shawn, Alessandro for your feedback. sorry if I took this for granted, I was just trying to understand if there could be a performance gain when *real* queries happens (against other fields too). So a smaller collection, a smaller document space, a smaller query in terms of number of filters and less calculation happens, so I'll "possibly" notice a difference. Well, now it seems obvious in some way.
Thanks again and best regards, Vincenzo On Fri, Nov 4, 2016 at 4:20 PM, Alessandro Benedetti <abenede...@apache.org> wrote: > Seconding Shawn, if your queries will always aim the active documents you > will see : > High level this is what is going to happen : > > A) You need to run your query + a filter query that will retrieve only > active documents. > The filter query results will be cached. > Solr will query over the entire document space, and then merge the query > results with the filtered documents. > > B) You run your query over the entire ( smaller) document space . > > So option B will be faster, possibly not massively but We do less > calculations. > > Cheers > > On Fri, Nov 4, 2016 at 2:45 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 11/4/2016 8:22 AM, Vincenzo D'Amore wrote: > > > Given 2 collection A and B: > > > > > > - A collection have 5 M documents with an attribute active: true/false. > > > - B collection have only 2.5 M documents, but all the documents have > > > attribute active:true > > > - in any case, A or B, I can only search upon documents that have > > > active:true > > > > > > Which one perform faster? > > > > This is not backed by knowledge of how the code internals operate, just > > things I've pieced together from my own experience and other things said > > on the list in response to past questions. > > > > Assuming you have the available memory to effectively cache both > > indexes, five million documents is chump change to Solr. If you don't > > have that memory, it might present a performance issue. > > > > Because query performance is largely dependent on the number of terms > > that Solr must look through, and the active field probably has at most > > three (true, false, and field not present), that part of your query will > > generally be very fast with ANY number of documents. > > > > If you search for all documents and filter on the active field, the > > difference between the two will probably be so small a human being would > > never notice it, but it probably would be a difference that you'd be > > able to measure. > > > > Where you *might* notice a difference is when you do a "real" query > > against other fields in the index, and filter on the active field. > > That's when the document count will usually track with the term count. > > The smaller collection may be noticeably faster for this kind of query. > > > > Thanks, > > Shawn > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251