Michael, i tried your idea of implementing own cursor in solr 4.6.1 itself
but some how that testcase was taking huge time.
Then i tried Cursor approach by upgrading solr to 4.10.3. With that got
better results. For Setup 2 now time reduced from
114 minutes to 18 minutes but still little far from Setup1 i.e 2 minutes.
Actually first 50 thousand request it self is taking about a minute. May be
i would need to see other things as pagination seems working better now.

thanks for giving valuable suggestions.

On Mon, Jan 19, 2015 at 11:20 AM, Naresh Yadav <nyadav....@gmail.com> wrote:

> Toke, won't be able to use TermsComponent as i had complex filter criteria
> on other fields.
>
> Michael, i understood your idea of paging without using start=,
> will prototype it as it is possible in my usecase also and post here
> results i got with this approach.
>
>
> On Sun, Jan 18, 2015 at 10:05 PM, Michael Sokolov <
> msoko...@safaribooksonline.com> wrote:
>
>> You can also implement your own cursor easily enough if you have a unique
>> sortkey (not relevance score). Say you can sort by id, then you select
>> batch 1 (50k docs, say) and record the last (maximum) id in the batch.  For
>> the next batch, limit it to id > last_id and get the first 50k docs (don't
>> use start= for paging).  This scales much better when scanning a large
>> result set; you'll get constant time across the whole set instead of having
>> it increase as you page deeper.
>>
>> -Mike
>>
>>
>> On 1/18/2015 7:45 AM, Naresh Yadav wrote:
>>
>>> Hi Toke,
>>>
>>> Thanks for sharing solr internal's for my problem. I will definitely try
>>> Cursor also but only problem is my current
>>> solr version is 4.6.1 in which i guess cursor support is not there. Any
>>> other option i have for this problem ??
>>>
>>> Also as per your suggestion i will try to avoid regional units in post.
>>>
>>> Thanks
>>> Naresh
>>>
>>> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
>>> wrote:
>>>
>>>  Naresh Yadav [nyadav....@gmail.com] wrote:
>>>>
>>>>> In both setups, we are reading in batches of 50k and each batch taking
>>>>> Setup1  : approx 7 seconds and for completing all batches of total 10
>>>>>
>>>> lakh
>>>>
>>>>> results takes 1 to 2 minutes.
>>>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10
>>>>>
>>>> lakh
>>>>
>>>>> results  takes 114 minutes.
>>>>>
>>>> Deep paging across shards without cursors means that for each request,
>>>> the
>>>> full result set up to that point must be requested from each shard. The
>>>> deeper your page, the longer it takes for each request. If you only
>>>> extracted 500K results instead of the 1M in setup 2, it would likely
>>>> take a
>>>> lot less than 114/2 minutes.
>>>>
>>>> Since you are exporting the full result set, you should be using a
>>>> cursor:
>>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>>>> This should make your extraction linear to the number of documents and
>>>> hopefully a lot faster than your current setup.
>>>>
>>>> Also, please refrain from using regional units such as "lakh" in an
>>>> international forum. It requires some readers (me for example) to
>>>> perform a
>>>> search in order to be sure on what you are talking about.
>>>>
>>>> - Toke Eskildsen
>>>>
>>>>
>>
>
>
>
>

Reply via email to