the 250K is an approximation, (total number of docs)/8. As in
one bit per document. Really, all a filter is is a bit-vector where
each bit represents whether the doc ID represented by that bit
should be included in the results or not. Technically, it's the
(largest doc id)/8 where (largest doc id) may be bigger than
the number of docs if you've deleted/added documents and
haven't yet optimized. So, the first byte represents docs 1-8,
second byte 9-16, etc.

See the Lucene website. Here's a place to start as far as scoring
is concerned:
http://lucene.apache.org/java/3_0_1/scoring.html

And, of course, there's Lucene In Action (second edition is
available from Manning as an e-book at least. But I admit
making the connection from the qf parameter to the underlying
Lucene structure is part of the "tribal knowledge" series. At
least I can't point you to a document offhand.


Best
Erick

On Sat, Mar 6, 2010 at 4:36 PM, MitchK <mitc...@web.de> wrote:

>
> Erick,
>
> your response was really helpfull - the problem is solved for the next
> time.
>
> However, there are two questions:
> Where do you know, that the bit-vector has a maximum size of 250k?
> Did I overlook something (because I have got an index of 2.000.000
> documents)?
>
> Are there any theoretical documents outside that explain how Solr's
> IndexSearcher works?
> I think this would be really helpfull for future questions.
>
> Kind regards
> - Mitch
>
>
> Erick Erickson wrote:
> >
> > The last thing I'd do is partition my index into two, unless and
> > until I really *knew* I had speed problems. The added complexity
> > isn't worth it and your index isn't huge, so search speed can
> > probably be addressed without that complexity.
> >
> > Filter queries are probably your first choice here. Memory isn't an
> > issue because they're implemented (as I understand) as a bit
> > vector. That is, each one (and you only have two) will be 250K
> > plus a slight overhead. Utterly insignificant.
> >
> > You can easily experiment with the differences in speed with a single
> > index between q and fq if you use a single index. You're right
> > that if you just tack on an AND to the q clause, the true/false
> > will contribute to the score, but I think they'll all contribute the
> > same amount, effectively doing nothing to the ranking. There is
> > something of an efficiency argument here, but maybe not
> > enough to notice.
> >
> > Faceting is generally used more for answering questions like
> > "given I've searched on query <Q> how many of my answers
> > are in groups A, B and C". Than drilling down to things like
> > "show me the ones in group C". Which, while related to your
> > problem isn't what it sounds like you're after.
> >
> > When measuring speed, remember that the first few queries
> > aren't representative.
> >
> > HTH
> > Erick
> > On Sat, Mar 6, 2010 at 12:32 PM, MitchK <mitc...@web.de> wrote:
> >
> >>
> >> Yes, that's possible.
> >>
> >> However I thought, that the normal-q-param forces Solr to lookup every
> >> check-field whereas it is true or false.
> >> So I am looking for something like a tree that devides the index into
> two
> >> pieces - true and false.
> >> So Solr do not need to lookup the check-field anymore, because it
> follows
> >> the right node of the tree and according to this, the IndexSearcher
> would
> >> be
> >> more efficient - I emphasize, that I think so, I don't really know.
> >> Another point is, that I have read, that the q-param is scoring every
> >> field
> >> and I don't want that the scoring contains on the check-field in parts.
> >>
> >> Hopefully I have explained my problem correctly.
> >> If there are questions, please ask.
> >>
> >> - Mitch
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Filter-Query-or-Main-Query-or-facetting--tp27804169p27805798.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Filter-Query-or-Main-Query-or-facetting--tp27804169p27807323.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to