Hello:

Term Vectors could be much faster than Intersectings with FilterCache.
Exception: when size of DocSet is close (more than 50%) to the total count of documents in the index.

When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.

Obviously calculating sizes of 200,000 intersections with FilterCache is slover than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms.


There are some related TODOs in SOLR source.


--
Thanks,

Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==============
http://www.linkedin.com/in/liferay
http://www.tokenizer.org





Reply via email to