Le 24/01/2011 13:10, Em a écrit :
Hi Daniem,

ahm, the formula I wrote was no definitive guide, just some numbers I
combined to visualize the amount of data - perhaps not even a complete
formula.

Well, when you can use your taxonomy as indexed-only you do not double the
used disk space when you are indexing two equal documents.
So, five document or 4 mi with the same taxonomy are equal in using disk space to one ?

Lucene - and also Solr - are working with an inverted index: This means
every document is mapped against its indexed terms.
So your index-size will depend on the number of unique taxonomy-terms and
the pointers of the documents to these terms. That's it. Usually the used
disk-space for an index is much smaller than the size of the original data.

I hope what I tried to explain was easy to understand.
Thanks, it's very helpfull !

How i can find more explaination on the internal structure of the Lucene indexer ?

Damien

Reply via email to