apache.org
> Sent: Friday, August 15, 2008 12:22:30 PM
> Subject: Re: Index size vs. number of documents
>
> By "Index size almost never grows linearly with the number of
> documents" are you saying it increases more slowly that the number of
> documents, i.e. sub-line
By "Index size almost never grows linearly with the number of
documents" are you saying it increases more slowly that the number of
documents, i.e. sub-linearly or more rapidly?
With dirty OCR the number of unique terms is always increasing due to
the garbage "words"
-Phil
Chris Hostetter w
: > I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
Unless the data in "stored" fields is significantly greater then "indexed"
fields the Index size almost never grows linearly with the number of
documents -- it's the number of unique terms that tends to primarily
in
Erick Erickson wrote:
I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
your MaxFieldLength? By default only the first 10,000 tokens are added
to a field per document. If you haven't set this higher, that could account
for it.
We set it to a very large number so we in
I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
your MaxFieldLength? By default only the first 10,000 tokens are added
to a field per document. If you haven't set this higher, that could account
for it.
As far as I know, optimization shouldn't really affect the index siz