Hello all,

We are experimenting with the ShingleFilter with a very large document set (1 
million full-text books). Because the ShingleFilter indexes every word pair as 
a token, the number of unique terms increases tremendously.  In our experiments 
so far the tii and tis files are getting very large and the tii file will 
eventually be too large to fit into memory.  If we set the TermIndexInterval to 
a larger number than the default 128, the tii file size should go down.  Is it 
possible to set this somehow through Solr configuration or do we need to modify 
the code somewhere and call IndexWriter.setTermIndexInterval?


Tom

Tom Burton-West
Digital Library Production Services
University of Michigan Library

 

Reply via email to