Le 24/01/2011 13:10, Em a écrit :
Hi Daniem,
ahm, the formula I wrote was no definitive guide, just some numbers I
combined to visualize the amount of data - perhaps not even a complete
formula.
Well, when you can use your taxonomy as indexed-only you do not double the
used disk space when you are indexing two equal documents.
So, five document or 4 mi with the same taxonomy are equal in using disk
space to one ?
Lucene - and also Solr - are working with an inverted index: This means
every document is mapped against its indexed terms.
So your index-size will depend on the number of unique taxonomy-terms and
the pointers of the documents to these terms. That's it. Usually the used
disk-space for an index is much smaller than the size of the original data.
I hope what I tried to explain was easy to understand.
Thanks, it's very helpfull !
How i can find more explaination on the internal structure of the Lucene
indexer ?
Damien