the 250K is an approximation, (total number of docs)/8. As in one bit per document. Really, all a filter is is a bit-vector where each bit represents whether the doc ID represented by that bit should be included in the results or not. Technically, it's the (largest doc id)/8 where (largest doc id) may be bigger than the number of docs if you've deleted/added documents and haven't yet optimized. So, the first byte represents docs 1-8, second byte 9-16, etc.
See the Lucene website. Here's a place to start as far as scoring is concerned: http://lucene.apache.org/java/3_0_1/scoring.html And, of course, there's Lucene In Action (second edition is available from Manning as an e-book at least. But I admit making the connection from the qf parameter to the underlying Lucene structure is part of the "tribal knowledge" series. At least I can't point you to a document offhand. Best Erick On Sat, Mar 6, 2010 at 4:36 PM, MitchK <mitc...@web.de> wrote: > > Erick, > > your response was really helpfull - the problem is solved for the next > time. > > However, there are two questions: > Where do you know, that the bit-vector has a maximum size of 250k? > Did I overlook something (because I have got an index of 2.000.000 > documents)? > > Are there any theoretical documents outside that explain how Solr's > IndexSearcher works? > I think this would be really helpfull for future questions. > > Kind regards > - Mitch > > > Erick Erickson wrote: > > > > The last thing I'd do is partition my index into two, unless and > > until I really *knew* I had speed problems. The added complexity > > isn't worth it and your index isn't huge, so search speed can > > probably be addressed without that complexity. > > > > Filter queries are probably your first choice here. Memory isn't an > > issue because they're implemented (as I understand) as a bit > > vector. That is, each one (and you only have two) will be 250K > > plus a slight overhead. Utterly insignificant. > > > > You can easily experiment with the differences in speed with a single > > index between q and fq if you use a single index. You're right > > that if you just tack on an AND to the q clause, the true/false > > will contribute to the score, but I think they'll all contribute the > > same amount, effectively doing nothing to the ranking. There is > > something of an efficiency argument here, but maybe not > > enough to notice. > > > > Faceting is generally used more for answering questions like > > "given I've searched on query <Q> how many of my answers > > are in groups A, B and C". Than drilling down to things like > > "show me the ones in group C". Which, while related to your > > problem isn't what it sounds like you're after. > > > > When measuring speed, remember that the first few queries > > aren't representative. > > > > HTH > > Erick > > On Sat, Mar 6, 2010 at 12:32 PM, MitchK <mitc...@web.de> wrote: > > > >> > >> Yes, that's possible. > >> > >> However I thought, that the normal-q-param forces Solr to lookup every > >> check-field whereas it is true or false. > >> So I am looking for something like a tree that devides the index into > two > >> pieces - true and false. > >> So Solr do not need to lookup the check-field anymore, because it > follows > >> the right node of the tree and according to this, the IndexSearcher > would > >> be > >> more efficient - I emphasize, that I think so, I don't really know. > >> Another point is, that I have read, that the q-param is scoring every > >> field > >> and I don't want that the scoring contains on the check-field in parts. > >> > >> Hopefully I have explained my problem correctly. > >> If there are questions, please ask. > >> > >> - Mitch > >> > >> -- > >> View this message in context: > >> > http://old.nabble.com/Filter-Query-or-Main-Query-or-facetting--tp27804169p27805798.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/Filter-Query-or-Main-Query-or-facetting--tp27804169p27807323.html > Sent from the Solr - User mailing list archive at Nabble.com. > >