SimpleFacets: Performance Boost for Tokenized Fields

Fuad Efendi Mon, 18 Aug 2008 08:11:38 -0700

Hello:


Term Vectors could be much faster than Intersectings with FilterCache.

Exception: when size of DocSet is close (more than 50%) to the totalcount of documents in the index.


When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.

Obviously calculating sizes of 200,000 intersections with FilterCacheis slover than traversing 10 - 20,000 documents for smaller DocSetsand counting frequencies of Terms.



There are some related TODOs in SOLR source.


--
Thanks,

Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==============
http://www.linkedin.com/in/liferay
http://www.tokenizer.org

SimpleFacets: Performance Boost for Tokenized Fields

Reply via email to