If you can spare the load of a long request, I'd do an unsorted query
for everything, non-paged. I'd dump that into a line-per-row format
and use something like Apache Hive to do the analysis.
Michael Della Bitta
Appinions
18 East 41st Street, 2nd
I do a brute-force regression test where I read all the documents from
shard 1 and compare them to documents in shard 2. I had to have all the
fields stored to do that, but in my case that doesn't change the size of
the index much.
So, in other words, I do a search for a page's worth of documents