Hello,

I think the documentation and example files for Solr 4.x need to be
updated.  If someone will let me know I'll be happy to fix the example
and perhaps someone with edit rights could fix the reference guide.

Due to dirty OCR and over 400 languages we have over 2 billion unique
terms in our index.  In Solr 3.6 we set termIndexInterval to 1024 (8
times the default of 128) to reduce the size of the in-memory index.
Previously we used termIndexDivisor for a similar purpose.

We suspect that in Solr 4.10 (and probably previous Solr 4.x versions)
termIndexInterval and termIndexDivisor do not apply to the default
codec and are probably unnecessary (since the default terms index now
uses a much more efficient representation).

According to the JavaDocs for IndexWriterConfig, the Lucene level
implementations of these do not apply to the default PostingsFormat
implementation.
http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29

Despite this statement in the Lucene JavaDocs, in the
example/solrconfig.xml there is the following:

<!-- Expert: Controls how often Lucene loads terms into memory
278 Default is 128 and is likely good for most everyone.
279 -->
280 <!-- <termIndexInterval>128</termIndexInterval> -->

In the 4.10 reference manual page 365 there is also an example showing
the termIndexInterval.

Can someone please confirm that these two parameter settings
termIndexInterval and termsIndexDivisor, do not apply to the default
PostingsFormat for Solr 4.10?

Tom

Reply via email to