Dear Solr users, we migrated our solution from Solr 4.0 to Solr 4.3 and we noticed a degradation of the search performance. We compared the two versions and found out that most of the time is spent in the decompression of the retrievable fields in Solr 4.3. The block compression of the documents is a great feature for us because it reduces the size of our index but we don’t have enough resources (I mean cpus) to safely migrate to the new version. In order to reduce the cost of the decompression we tried a simple patch in the BinaryResponseWriter; during the first phase of the distributed search the response writer gets the documents from the index reader to only extract the doc ids of the top N results. Our patch uses the field cache to get the doc ids during the first phase and thus replaces a full decompression of 16k blocks (for a single document) by a simple get in an array (the field cache or the doc values). Thanks to this patch we are now able to handle the same number of QPS than before (with Solr 4.0). Of course the document cache could help as well but but not as much as one would have though (mainly because we have a lot of deep paging queries).
I am sure that the idea we implemented is not new but I haven’t seen any Jira about it. Should we create one (I mean does it have a chance to be included in future release of Solr or does anybody already working on this) ? Cheers, Jim