You seem to have the soft and hard commits the wrong way around. Hard commit is more expensive.
On 28 December 2017 at 09:10, Walter Underwood <wun...@wunderwood.org> wrote: > Why are you using Solr for log search? Elasticsearch is widely used for > log search and has the best infrastructure for that. > > For the past few years, it looks like a natural market segmentation is > happening, with Solr used for product search and ES for log search. By now, > I would not expect Solr to keep up with ES in log search features. > Likewise, I would not expect ES to keep up with Solr for product and text > search features. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Dec 27, 2017, at 1:33 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > > You are probably hitting more and more background merging which will > > slow things down. Your system looks to be severely undersized for this > > scale. > > > > One thing you can try (and I emphasize I haven't prototyped this) is > > to increase your RamBufferSizeMB solrcofnig.xml setting significantly. > > By default, Solr won't merge segments to greater than 5G, so > > theoretically you could just set your ramBufferSizeMB to that figure > > and avoid merging all together. Or you could try configuring the > > NoMergePolicy in solrconfig.xml (but beware that you're going to > > create a lot of segments unless you set the rambuffersize higher). > > > > How this will affect your indexing throughput I frankly have no data. > > You can see that with numbers like this, though, a 4G heap is much too > > small. > > > > Best, > > Erick > > > > On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar > > <pra...@cumulus-systems.com> wrote: > >> Hello All, > >> > >> We have been building a Solr based solution to hold a large amount of > data (approx 4 TB/day or > 24 Billion documents per day). We are developing > a prototype on a small scale just to evaluate Solr performance gradually. > Here is our setup configuration. > >> > >> Solr cloud: > >> node1: 16 GB RAM, 8 Core CPU, 1TB disk > >> node2: 16 GB RAM, 8 Core CPU, 1TB disk > >> > >> Zookeeper is also installed on above 2 machines in cluster mode. > >> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds > >> Schema: Basic configuration. 5 fields indexed (out of one is > text_general), 6 fields stored. > >> Collection: 12 shards (6 per node) > >> Heap memory: 4 GB per node > >> Disk cache: 12 GB per node > >> Document is a syslog message. > >> > >> Documents are being ingested into Solr from different nodes. 12 SolrJ > clients ingest data into the Solr cloud. > >> > >> We are experiencing issues when we keep the setup running for long time > and after processing around 100 GB of index size (I.e. Around 600 Million > documents). Note that we are only indexing the data and not querying it. So > there should not be any query overhead. From the VM analysis we figured out > that over time the disk operations starts declining and so does the CPU, > RAM and Network usage of the Solr nodes. We concluded that Solr is unable > to handle one big collection due to index read/write overhead and most of > the time it ends up doing only the commit (evident in Solr logs). And > because of that indexing is getting hampered (?) > >> > >> So we thought of creating small sized collections instead of one big > collection anticipating the commit performance might improve. But > eventually the performance degrades even with that and we observe more or > less similar charts for CPU, memory, disk and network. > >> > >> To put forth some stats here are the number of documents processed > every hour > >> > >> 1St hour: 250 million > >> 2nd hour: 250 million > >> 3rd hour: 240 million > >> 4th hour: 200 million > >> . > >> . > >> 11th hour: 80 million > >> > >> Could you please help us identifying the root cause of degradation in > the performance? Are we doing something wrong with the Solr configuration > or the collections/sharding etc? Due to this performance degradation we are > currently stuck with Solr. > >> > >> Thank you very much in advance. > >> > >> Prasad Tendulkar > >> > >> > >