bq: 10GB JVM as mentioned here...and they were getting 140 ms response
time for 10 Billion documents

This simply could _not_ work in a single shard as there's a hard 2B
doc limit per shard. On slide 14
it states "both collections are sharded". They are not fitting 10B
docs in 10G of JVM on a single
machine. Trust me on this ;). The slides do not state how many shards they've
split their collection into, but I suspect it's a bunch. Each
application is different enough that the
numbers wouldn't translate anyway...

70M docs can fit on a single shard with quite good response time, but
YMMV. You simply
have to experiment. Here's a long blog on the subject:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Start with a profiler and see where you're spending your time. My
first guess is that
you're spending a lot of CPU cycles in garbage collection. This
sometimes happens
when you are running near your JVM limit, a GC kicks in and recovers a
tiny bit of memory
and then initiates another GC cycle immediately. Turn on GC logging
and take a look
at the stats provided, see:
https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/

Tens of seconds is entirely unexpected though. Do the Solr logs point
to anything happening?

Best,
Erick

On Fri, Oct 9, 2015 at 8:51 AM, Salman Ansari <salman.rah...@gmail.com> wrote:
> Thanks Eric for your response. If you find pagination is not the main
> culprit, what other factors do you guys suggest I need to tweak to test
> that? As I mentioned, by navigating to 20000 results using start and row I
> am getting time out from Solr.NET and I need a way to fix that.
>
> You suggested that 4GB JVM is not enough, I have seen MapQuest going with
> 10GB JVM as mentioned here
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> and they were getting 140 ms response time for 10 Billion documents. Not
> sure how many shards they had though. With data of around 70M documents,
> what do you guys suggest as how many shards should I use and how much
> should I dedicate for RAM and JVM?
>
> Regards,
> Salman
>
> On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> I think paging is something of a red herring. You say:
>>
>> bq: but still I get delays of around 16 seconds and sometimes even more.
>>
>> Even for a start of 1,000, this is ridiculously long for Solr. All
>> you're really saving
>> here is keeping a record of the id and score for a list 1,000 cells
>> long (or even
>> 20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful,
>> but it's still hard to believe it's responsible for what you're seeing.
>>
>> Having 4G of RAM for 70M docs is very little memory, assuming this is on
>> a single shard.
>>
>> So my suspicion is that you have something fundamentally slow about
>> your system, the additional overhead shouldn't be as large as you're
>> reporting.
>>
>> And I'll second Toke's comment. It's very rare that users see anything
>> _useful_ by navigating that deep. Make them hit next next next and they'll
>> tire out way before that.
>>
>> Cursor mark's sweet spot is handling some kind of automated process that
>> goes through the whole result set. It'll work for what you're trying
>> to do though.
>>
>> Best,
>> Erick
>>
>> On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari <salman.rah...@gmail.com>
>> wrote:
>> > Is this a real problem or a worry? Do you have users that page really
>> deep
>> > and if so, have you considered other mechanisms for delivering what they
>> > need?
>> >
>> > The issue is that currently I have around 70M documents and some generic
>> > queries are resulting in lots of pages. Now if I try deep navigation (to
>> > page# 1000 for example), a lot of times the query takes so long that
>> > Solr.NET throws operation time out exception. The first page is
>> relatively
>> > faster to load but it does take around few seconds as well. After reading
>> > some documentation I realized that cursors could help and it does. I have
>> > tried to following the test better performance:
>> >
>> > 1) Used cursors instead of start and row
>> > 2) Increased the RAM on my Solr machine to 14GB
>> > 3) Increase the JVM on that machine to 4GB
>> > 4) Increased the filterChache
>> > 5) Increased the docCache
>> > 6) Run Optimize on the Solr Admin
>> >
>> > but still I get delays of around 16 seconds and sometimes even more.
>> > What other mechanisms do you suggest I should use to handle this issue?
>> >
>> > While pagination is faster than increasing the start parameter, the
>> > difference is small as long as you stay below a start of 1000. 10K might
>> > also work for you. Do your users page beyond that?
>> > I can limit users not to go beyond 10K but still think at that level
>> > cursors will be much faster than increasing the start variable as
>> explained
>> > here (
>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>> > ), have you tried both ways on your collection and it was giving you
>> > similar results?
>> >
>> > On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
>> > wrote:
>> >
>> >> Salman Ansari <salman.rah...@gmail.com> wrote:
>> >>
>> >> [Pagination with cursors]
>> >>
>> >> > For example, what happens if the user navigates from page 1 to page 2,
>> >> > does the front end  need to store the next cursor at each query?
>> >>
>> >> Yes.
>> >>
>> >> > What about going to a previous page, do we need to store all cursors
>> >> > that have been navigated up to now at the client side?
>> >>
>> >> Yes, if you want to provide that functionality.
>> >>
>> >> Is this a real problem or a worry? Do you have users that page really
>> deep
>> >> and if so, have you considered other mechanisms for delivering what they
>> >> need?
>> >>
>> >> While pagination is faster than increasing the start parameter, the
>> >> difference is small as long as you stay below a start of 1000. 10K might
>> >> also work for you. Do your users page beyond that?
>> >>
>> >> - Toke Eskildsen
>> >>
>>

Reply via email to