Compression vs FieldCache for doc ids retrieval

jim ferenczi Mon, 26 May 2014 07:32:35 -0700

Dear Solr users,

we migrated our solution from Solr 4.0 to Solr 4.3 and we noticed a
degradation of the search performance. We compared the two versions and
found out that most of the time is spent in the decompression of the
retrievable fields in Solr 4.3. The block compression of the documents is a
great feature for us because it reduces the size of our index but we don’t
have enough resources (I mean cpus) to safely migrate to the new version.
In order to reduce the cost of the decompression we tried a simple patch in
the BinaryResponseWriter; during the first phase of the distributed search
the response writer gets the documents from the index reader to only
extract the doc ids of the top N results. Our patch uses the field cache to
get the doc ids during the first phase and thus replaces a full
decompression of 16k blocks (for a single document) by a simple get in an
array (the field cache or the doc values). Thanks to this patch we are now
able to handle the same number of QPS than before (with Solr 4.0). Of
course the document cache could help as well but but not as much as one
would have though (mainly because we have a lot of deep paging queries).


I am sure that the idea we implemented is not new but I haven’t seen any
Jira about it. Should we create one (I mean does it have a chance to be
included in future release of Solr or does anybody already working on this)
?

Cheers,

Jim

Compression vs FieldCache for doc ids retrieval

Reply via email to