Re: Field Compression

Mike Klaas Sun, 03 Feb 2008 13:55:22 -0800


On 3-Feb-08, at 1:34 PM, Stu Hood wrote:

I just finished watching this talk about a column-store RDBMS,which has a long section on column compression. Specifically, ittalks about the gains from compressing similar data together, andhow lazily decompressing data only when it must be processed isgreat for memory/CPU cache usage.
http://youtube.com/watch?v=yrLd-3lnZ58
While interesting, its not relevant to Lucene's stored fieldstorage. On the other hand, it did get me thinking about storedfield compression and lazy field loading.
Can anyone give me some pointers about compressThreshold valuesthat would be worth experimenting with? Our stored fields are oftenbetween 20 and 300 characters, and we're willing to spend more timeindexing if it will make searching less IO bound.

Field compression can save you space and converts the field into abinary field, which is lazy-loaded more efficiently than a stringfield. As for the threshold, I use 200 on a multi-kilobyte field,but this doesn't mean that it isn't effective on smaller fields.Experimentation on small indices followed by claculating the avg.stored bytes/docs is usually fruitful.

Of course, the best way to improve performance in this regard is tostore the less-frequently-used fields in a parallel solr index. Thisonly works if the largest fields are the rarely-used ones, though(like retrieving the doc contents to create a summary).


-Mike

Re: Field Compression

Reply via email to