Dan A. Dickey wrote:
I just came across the maxFieldLength setting for the mainIndex
in solrconfig.xml and have a question or two about it.
The default value is 10000.

I'm extracting text from pdf documents and
storing them into a text field.  Is the length of this text field limited
to 10000 characters?  Many pdf documents are megabytes in size.
Do this mean that only the first 10000 characters are getting indexed?

Is there a good way to index the whole document, or do I just simply
need to increase the size of maxFieldLength?  What performance
ramifications would something like this have?

maxFieldLength is counted in tokens, not chars, so you should be pretty safe unless your documents contain a lot of text.

You can of course set this value to whatever you want, including Integer.MAX_VALUE. This has performance consequences - terms found at large positions will increase the length of posting lists, which leads to increased memory/CPU consumption during decoding and traversing of the lists. Also, the overall increased number of positions will have an impact on the index size.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to