bq: 10GB JVM as mentioned here...and they were getting 140 ms response time for 10 Billion documents
This simply could _not_ work in a single shard as there's a hard 2B doc limit per shard. On slide 14 it states "both collections are sharded". They are not fitting 10B docs in 10G of JVM on a single machine. Trust me on this ;). The slides do not state how many shards they've split their collection into, but I suspect it's a bunch. Each application is different enough that the numbers wouldn't translate anyway... 70M docs can fit on a single shard with quite good response time, but YMMV. You simply have to experiment. Here's a long blog on the subject: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Start with a profiler and see where you're spending your time. My first guess is that you're spending a lot of CPU cycles in garbage collection. This sometimes happens when you are running near your JVM limit, a GC kicks in and recovers a tiny bit of memory and then initiates another GC cycle immediately. Turn on GC logging and take a look at the stats provided, see: https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ Tens of seconds is entirely unexpected though. Do the Solr logs point to anything happening? Best, Erick On Fri, Oct 9, 2015 at 8:51 AM, Salman Ansari <salman.rah...@gmail.com> wrote: > Thanks Eric for your response. If you find pagination is not the main > culprit, what other factors do you guys suggest I need to tweak to test > that? As I mentioned, by navigating to 20000 results using start and row I > am getting time out from Solr.NET and I need a way to fix that. > > You suggested that 4GB JVM is not enough, I have seen MapQuest going with > 10GB JVM as mentioned here > http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr > and they were getting 140 ms response time for 10 Billion documents. Not > sure how many shards they had though. With data of around 70M documents, > what do you guys suggest as how many shards should I use and how much > should I dedicate for RAM and JVM? > > Regards, > Salman > > On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> I think paging is something of a red herring. You say: >> >> bq: but still I get delays of around 16 seconds and sometimes even more. >> >> Even for a start of 1,000, this is ridiculously long for Solr. All >> you're really saving >> here is keeping a record of the id and score for a list 1,000 cells >> long (or even >> 20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful, >> but it's still hard to believe it's responsible for what you're seeing. >> >> Having 4G of RAM for 70M docs is very little memory, assuming this is on >> a single shard. >> >> So my suspicion is that you have something fundamentally slow about >> your system, the additional overhead shouldn't be as large as you're >> reporting. >> >> And I'll second Toke's comment. It's very rare that users see anything >> _useful_ by navigating that deep. Make them hit next next next and they'll >> tire out way before that. >> >> Cursor mark's sweet spot is handling some kind of automated process that >> goes through the whole result set. It'll work for what you're trying >> to do though. >> >> Best, >> Erick >> >> On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari <salman.rah...@gmail.com> >> wrote: >> > Is this a real problem or a worry? Do you have users that page really >> deep >> > and if so, have you considered other mechanisms for delivering what they >> > need? >> > >> > The issue is that currently I have around 70M documents and some generic >> > queries are resulting in lots of pages. Now if I try deep navigation (to >> > page# 1000 for example), a lot of times the query takes so long that >> > Solr.NET throws operation time out exception. The first page is >> relatively >> > faster to load but it does take around few seconds as well. After reading >> > some documentation I realized that cursors could help and it does. I have >> > tried to following the test better performance: >> > >> > 1) Used cursors instead of start and row >> > 2) Increased the RAM on my Solr machine to 14GB >> > 3) Increase the JVM on that machine to 4GB >> > 4) Increased the filterChache >> > 5) Increased the docCache >> > 6) Run Optimize on the Solr Admin >> > >> > but still I get delays of around 16 seconds and sometimes even more. >> > What other mechanisms do you suggest I should use to handle this issue? >> > >> > While pagination is faster than increasing the start parameter, the >> > difference is small as long as you stay below a start of 1000. 10K might >> > also work for you. Do your users page beyond that? >> > I can limit users not to go beyond 10K but still think at that level >> > cursors will be much faster than increasing the start variable as >> explained >> > here ( >> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results >> > ), have you tried both ways on your collection and it was giving you >> > similar results? >> > >> > On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen <t...@statsbiblioteket.dk> >> > wrote: >> > >> >> Salman Ansari <salman.rah...@gmail.com> wrote: >> >> >> >> [Pagination with cursors] >> >> >> >> > For example, what happens if the user navigates from page 1 to page 2, >> >> > does the front end need to store the next cursor at each query? >> >> >> >> Yes. >> >> >> >> > What about going to a previous page, do we need to store all cursors >> >> > that have been navigated up to now at the client side? >> >> >> >> Yes, if you want to provide that functionality. >> >> >> >> Is this a real problem or a worry? Do you have users that page really >> deep >> >> and if so, have you considered other mechanisms for delivering what they >> >> need? >> >> >> >> While pagination is faster than increasing the start parameter, the >> >> difference is small as long as you stay below a start of 1000. 10K might >> >> also work for you. Do your users page beyond that? >> >> >> >> - Toke Eskildsen >> >> >>