You seem to have the soft and hard commits the wrong way around. Hard
commit is more expensive.

On 28 December 2017 at 09:10, Walter Underwood <wun...@wunderwood.org>
wrote:

> Why are you using Solr for log search? Elasticsearch is widely used for
> log search and has the best infrastructure for that.
>
> For the past few years, it looks like a natural market segmentation is
> happening, with Solr used for product search and ES for log search. By now,
> I would not expect Solr to keep up with ES in log search features.
> Likewise, I would not expect ES to keep up with Solr for product and text
> search features.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 27, 2017, at 1:33 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > You are probably hitting more and more background merging which will
> > slow things down. Your system looks to be severely undersized for this
> > scale.
> >
> > One thing you can try (and I emphasize I haven't prototyped this) is
> > to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
> > By default, Solr won't merge segments to greater than 5G, so
> > theoretically you could just set your ramBufferSizeMB to that figure
> > and avoid merging all together. Or you could try configuring the
> > NoMergePolicy in solrconfig.xml (but beware that you're going to
> > create a lot of segments unless you set the rambuffersize higher).
> >
> > How this will affect your indexing throughput I frankly have no data.
> > You can see that with numbers like this, though, a 4G heap is much too
> > small.
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
> > <pra...@cumulus-systems.com> wrote:
> >> Hello All,
> >>
> >> We have been building a Solr based solution to hold a large amount of
> data (approx 4 TB/day or > 24 Billion documents per day). We are developing
> a prototype on a small scale just to evaluate Solr performance gradually.
> Here is our setup configuration.
> >>
> >> Solr cloud:
> >> node1: 16 GB RAM, 8 Core CPU, 1TB disk
> >> node2: 16 GB RAM, 8 Core CPU, 1TB disk
> >>
> >> Zookeeper is also installed on above 2 machines in cluster mode.
> >> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
> >> Schema: Basic configuration. 5 fields indexed (out of one is
> text_general), 6 fields stored.
> >> Collection: 12 shards (6 per node)
> >> Heap memory: 4 GB per node
> >> Disk cache: 12 GB per node
> >> Document is a syslog message.
> >>
> >> Documents are being ingested into Solr from different nodes. 12 SolrJ
> clients ingest data into the Solr cloud.
> >>
> >> We are experiencing issues when we keep the setup running for long time
> and after processing around 100 GB of index size (I.e. Around 600 Million
> documents). Note that we are only indexing the data and not querying it. So
> there should not be any query overhead. From the VM analysis we figured out
> that over time the disk operations starts declining and so does the CPU,
> RAM and Network usage of the Solr nodes. We concluded that Solr is unable
> to handle one big collection due to index read/write overhead and most of
> the time it ends up doing only the commit (evident in Solr logs). And
> because of that indexing is getting hampered (?)
> >>
> >> So we thought of creating small sized collections instead of one big
> collection anticipating the commit performance might improve. But
> eventually the performance degrades even with that and we observe more or
> less similar charts for CPU, memory, disk and network.
> >>
> >> To put forth some stats here are the number of documents processed
> every hour
> >>
> >> 1St hour: 250 million
> >> 2nd hour: 250 million
> >> 3rd hour: 240 million
> >> 4th hour: 200 million
> >> .
> >> .
> >> 11th hour: 80 million
> >>
> >> Could you please help us identifying the root cause of degradation in
> the performance? Are we doing something wrong with the Solr configuration
> or the collections/sharding etc? Due to this performance degradation we are
> currently stuck with Solr.
> >>
> >> Thank you very much in advance.
> >>
> >> Prasad Tendulkar
> >>
> >>
>
>

Reply via email to