did the amount of data finally exceed your per machine RAM capacity?
is it the same 20% each time you read? or do your periodic reads
eventually work through the entire dataset?
if you are essentially table scanning your data set, and the size
exceeds available RAM, then a degradation like that i
I have a two node cluster hosting a 45 gig dataset. I periodically have to
read a high fraction (20% or so) of my 'rows', grabbing a few thousand at a
time and then processing them.
This used to result in about 300-500 reads a second which seemed quite
good. Recently that number has plummeted to