Re: Need Debug Direction on Performance Problem

Michael Sokolov Sun, 18 Jan 2015 08:36:02 -0800

You can also implement your own cursor easily enough if you have aunique sortkey (not relevance score). Say you can sort by id, then youselect batch 1 (50k docs, say) and record the last (maximum) id in thebatch. For the next batch, limit it to id > last_id and get the first50k docs (don't use start= for paging). This scales much better whenscanning a large result set; you'll get constant time across the wholeset instead of having it increase as you page deeper.


-Mike


On 1/18/2015 7:45 AM, Naresh Yadav wrote:

Hi Toke,

Thanks for sharing solr internal's for my problem. I will definitely try
Cursor also but only problem is my current
solr version is 4.6.1 in which i guess cursor support is not there. Any
other option i have for this problem ??

Also as per your suggestion i will try to avoid regional units in post.

Thanks
Naresh

On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

Naresh Yadav [nyadav....@gmail.com] wrote:

In both setups, we are reading in batches of 50k and each batch taking
Setup1  : approx 7 seconds and for completing all batches of total 10

lakh

results takes 1 to 2 minutes.
Setup2 : approx 2-3 minutes and for completing all batches of total 10

lakh

results  takes 114 minutes.

Deep paging across shards without cursors means that for each request, the
full result set up to that point must be requested from each shard. The
deeper your page, the longer it takes for each request. If you only
extracted 500K results instead of the 1M in setup 2, it would likely take a
lot less than 114/2 minutes.

Since you are exporting the full result set, you should be using a cursor:
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
This should make your extraction linear to the number of documents and
hopefully a lot faster than your current setup.

Also, please refrain from using regional units such as "lakh" in an
international forum. It requires some readers (me for example) to perform a
search in order to be sure on what you are talking about.

- Toke Eskildsen

Re: Need Debug Direction on Performance Problem

Reply via email to