Thanks for the good suggestions on read traffic. I have been simulating reads through parsing our elb logs and replaying them from a fleet of test servers acting as frontends using Siege <https://www.joedog.org/siege-home/>. We are hoping to tune mostly based on exact use case, and so this seems the most effective route. I see why for the average user experience, 0-hit queries would provide some better data. Our plan is to start with exact user patterns and then branch and refine our metrics from there.
For writes, I am using an index rebuild which we have written. We use this for building anew or refreshing an existing index in case of changes to our data model, document structure, schema, etc... It was actually turning on this rebuild to our main cluster that started edging us toward the performance limits on writes. After writing last, we discovered we were garbage collection limited in our current cluster. We noticed that when doing writes, especially the large volume of writes our background rebuild was using, we generally do okay, but eventually the GC would do a deep pass and we'd see 504 gateway timeouts. We updated with the settings from Shawn Heisey <https://wiki.apache.org/solr/ShawnHeisey>'s page, and we have only seen timeouts a couple of times since then (these don't kill the rebuild, they simply get retried later). I see from you here and on another thread right now that gc seems to be an area of active discussion. Best, Stephen On Mon, May 2, 2016 at 9:20 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Bram: > > That works. I try to monitor the number of 0-hit > queries when I generate a test set on the theory that > those are _usually_ groups of random terms I've > selected that aren't a good model. So it's often > a sequence like "generate my list, see which > ones give 0 results and remove them". Rinse, > repeat. > > Like you said, imperfect but _loads_ better than > trying to create them without real user queries > as guidance... > > Best, > Erick > > On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam <bram.van...@intix.eu> > wrote: > >> If I'm reading this right, you have 420M docs on a single shard? > >> Yep, you were reading it right. > > > > Is Erick mentioned, it's hard to give concrete sizing advice, but we've > > found 120M to be the magic number. When a shard contains more than 120M > > documents, performance goes down rapidly & GC pauses grow a lot longer. > > Up until 250M things remain acceptable. But then performance starts to > > drop very quickly after that. > > > > - Bram > > > -- Stephen (206)753-9320 stephen-lewis.net