Thanks Shawn, Alessandro for your feedback.

sorry if I took this for granted, I was just trying to understand if there
could be a performance gain when *real* queries happens (against other
fields too).
So a smaller collection, a smaller document space, a smaller query in terms
of number of filters and less calculation happens, so I'll "possibly"
notice a difference.
Well, now it seems obvious in some way.

Thanks again and best regards,
Vincenzo


On Fri, Nov 4, 2016 at 4:20 PM, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Seconding Shawn, if your queries will always aim the active documents you
> will see :
> High level this is what is going to happen :
>
> A) You need to run your query + a filter query that will retrieve only
> active documents.
> The filter query results will be cached.
> Solr will query over the entire document space, and then merge the query
> results with the filtered documents.
>
> B) You run your query over the entire ( smaller) document space .
>
> So option B will be faster, possibly not massively but We do less
> calculations.
>
> Cheers
>
> On Fri, Nov 4, 2016 at 2:45 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 11/4/2016 8:22 AM, Vincenzo D'Amore wrote:
> > > Given 2 collection A and B:
> > >
> > > - A collection have 5 M documents with an attribute active: true/false.
> > > - B collection have only 2.5 M documents, but all the documents have
> > > attribute active:true
> > > - in any case, A or B, I can only search upon documents that have
> > > active:true
> > >
> > > Which one perform faster?
> >
> > This is not backed by knowledge of how the code internals operate, just
> > things I've pieced together from my own experience and other things said
> > on the list in response to past questions.
> >
> > Assuming you have the available memory to effectively cache both
> > indexes, five million documents is chump change to Solr.  If you don't
> > have that memory, it might present a performance issue.
> >
> > Because query performance is largely dependent on the number of terms
> > that Solr must look through, and the active field probably has at most
> > three (true, false, and field not present), that part of your query will
> > generally be very fast with ANY number of documents.
> >
> > If you search for all documents and filter on the active field, the
> > difference between the two will probably be so small a human being would
> > never notice it, but it probably would be a difference that you'd be
> > able to measure.
> >
> > Where you *might* notice a difference is when you do a "real" query
> > against other fields in the index, and filter on the active field.
> > That's when the document count will usually track with the term count.
> > The smaller collection may be noticeably faster for this kind of query.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Reply via email to