Why are you using Solr for log search? Elasticsearch is widely used for log 
search and has the best infrastructure for that.

For the past few years, it looks like a natural market segmentation is 
happening, with Solr used for product search and ES for log search. By now, I 
would not expect Solr to keep up with ES in log search features. Likewise, I 
would not expect ES to keep up with Solr for product and text search features.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 27, 2017, at 1:33 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> You are probably hitting more and more background merging which will
> slow things down. Your system looks to be severely undersized for this
> scale.
> 
> One thing you can try (and I emphasize I haven't prototyped this) is
> to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
> By default, Solr won't merge segments to greater than 5G, so
> theoretically you could just set your ramBufferSizeMB to that figure
> and avoid merging all together. Or you could try configuring the
> NoMergePolicy in solrconfig.xml (but beware that you're going to
> create a lot of segments unless you set the rambuffersize higher).
> 
> How this will affect your indexing throughput I frankly have no data.
> You can see that with numbers like this, though, a 4G heap is much too
> small.
> 
> Best,
> Erick
> 
> On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
> <pra...@cumulus-systems.com> wrote:
>> Hello All,
>> 
>> We have been building a Solr based solution to hold a large amount of data 
>> (approx 4 TB/day or > 24 Billion documents per day). We are developing a 
>> prototype on a small scale just to evaluate Solr performance gradually. Here 
>> is our setup configuration.
>> 
>> Solr cloud:
>> node1: 16 GB RAM, 8 Core CPU, 1TB disk
>> node2: 16 GB RAM, 8 Core CPU, 1TB disk
>> 
>> Zookeeper is also installed on above 2 machines in cluster mode.
>> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
>> Schema: Basic configuration. 5 fields indexed (out of one is text_general), 
>> 6 fields stored.
>> Collection: 12 shards (6 per node)
>> Heap memory: 4 GB per node
>> Disk cache: 12 GB per node
>> Document is a syslog message.
>> 
>> Documents are being ingested into Solr from different nodes. 12 SolrJ 
>> clients ingest data into the Solr cloud.
>> 
>> We are experiencing issues when we keep the setup running for long time and 
>> after processing around 100 GB of index size (I.e. Around 600 Million 
>> documents). Note that we are only indexing the data and not querying it. So 
>> there should not be any query overhead. From the VM analysis we figured out 
>> that over time the disk operations starts declining and so does the CPU, RAM 
>> and Network usage of the Solr nodes. We concluded that Solr is unable to 
>> handle one big collection due to index read/write overhead and most of the 
>> time it ends up doing only the commit (evident in Solr logs). And because of 
>> that indexing is getting hampered (?)
>> 
>> So we thought of creating small sized collections instead of one big 
>> collection anticipating the commit performance might improve. But eventually 
>> the performance degrades even with that and we observe more or less similar 
>> charts for CPU, memory, disk and network.
>> 
>> To put forth some stats here are the number of documents processed every hour
>> 
>> 1St hour: 250 million
>> 2nd hour: 250 million
>> 3rd hour: 240 million
>> 4th hour: 200 million
>> .
>> .
>> 11th hour: 80 million
>> 
>> Could you please help us identifying the root cause of degradation in the 
>> performance? Are we doing something wrong with the Solr configuration or the 
>> collections/sharding etc? Due to this performance degradation we are 
>> currently stuck with Solr.
>> 
>> Thank you very much in advance.
>> 
>> Prasad Tendulkar
>> 
>> 

Reply via email to