In order to accelerate the BinaryResponseWriter.write we extended this writer class to implement the docid to id tranformation by docValues (on memory) with no need to access stored field for id reading nor lazy loading of fields that also has a cost. That should improve read rate as docValues are sequential and should avoid disk IO. This docValues implementation is accessed during both query stages (as mentioned above) in case you ask for id's only, or only once, during the distributed search stage, in case you intend asking for stored fields different than id.
We just started testing it for performance. I would love hearing any oppinions or test performances for this implementation Manu