Hello:
Term Vectors could be much faster than Intersectings with FilterCache.
Exception: when size of DocSet is close (more than 50%) to the total
count of documents in the index.
When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
Obviously calculating sizes of 200,000 intersections with FilterCache
is slover than traversing 10 - 20,000 documents for smaller DocSets
and counting frequencies of Terms.
There are some related TODOs in SOLR source.
--
Thanks,
Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==============
http://www.linkedin.com/in/liferay
http://www.tokenizer.org