Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

Shawn Heisey Thu, 25 Jul 2013 16:21:52 -0700

On 7/25/2013 4:45 PM, Tom Burton-West wrote:

Thanks for your help.   I found a workaround for this use case, which is to
avoid using a shards query and just asking each shard for a dump of the
unique ids. i.e. run an *:* query and ask for 1 million rows at a time.
This should be a no scoring query, so I would think that it doesn't have to
do any ranking or sorting.   What I am now seeing is that qtimes have gone
up from about 5 seconds per request to nearly a minute as the start
parameter gets higher.  I don't know if this is actually because of the
start parameter or if something is happening with memory use and/or caching
that is just causing things to take longer.  I'm at around 35 out of 119
million for this shard and queries have gone from taking 5 seconds to
taking almost a minute.


INFO: [core] webapp=/dev-1 path=/select
params={fl=vol_id&indent=on&start=36000000&q=*:*&rows=1000000}
hits=119220943 status=0 QTime=52952

Sounds like your servers are handling deep paging far better than Iwould have guessed. I've seen people talk about exponential query timegrowth from deep paging after only a few pages. Your times are goingup, but the increase is *relatively* slow, and you've made it 36 pages in.

Getting the information as you're doing it now will be slow, butprobably reliable. Moving to non-distributed requests against theindividual shards was a good idea.

From my own testing: By bumping my max heap on my dev server from 7GBto 9GB, I was able to get a million row result (distributed) in onlyfour minutes, whereas it had reached 45 minutes before with no end insight. It was having huge GC pauses from extremely frequent full GCs.That problem persisted after the heap increase, but it wasn't as bad,and I was also dealing with the fact that my OS disk cache on the devserver is way too small.


Thanks,
Shawn

Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

Reply via email to