Swawn, thanks you very much for your answer.
On Wed, May 2, 2018 at 6:27 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 5/2/2018 4:54 AM, Patrick Recchia wrote: > > I'm seeing way too many commits on our solr cluster, and I don't know > why. > > Are you sure there are commits happening? Do you have logs actually > saying that a commit is occurring? The creation of a new segment does > not necessarily mean a commit happened -- this can happen even without a > commit. > You're right, I assumed a new segment would be created only as part of a commit; but I realize now that there can be other situations. Is there any logging I can turn on to know when a commit happens and/or when a segment is flushed? I would be very interested in that I've already enabled InfoStream logging from the IndexWriter, but have found nothing yet there to help me understand that > > - IndexConfig is set to autoCommit every minute: > > > > <autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> < > > openSearcher>true</openSearcher> </autoCommit> > > > > (solr.autoCommit.maxTime is not set) > > It's recommended to set openSearcher to false on autoCommit. Do you > have autoSoftCommit configured? > autoSoftCommit is left at its default '-1' (which means infinity, I suppose). > > > There is nothing else customized (when it comes to IndexWriter, at least) > > within solrconfig.xml > > > > The data is sent without commit, but with commitWithin=500000 ms. > > > > All that said, I would have expected a rate of about 1 segment created > epr > > minute; of about 100MB. > > One of the events that can cause a new segment to be flushed is the ram > buffer filling up. Solr defaults to a ramBufferSizeMB value of 100. > But that does not translate to a segment size of 100MB -- it's merely > the size of the ram buffer that Lucene uses for all the work related to > building a segment. A segment resulting from a full memory buffer is > going to be smaller than the buffer. I do not know how MUCH smaller, or > what causes variations in that size. > > The general advice is to leave the buffer size alone. But with the high > volume you've got, you might want to increase it so segments are not > flushed as frequently. Be aware that increasing it will have an impact > on how much heap memory gets used. Every Solr core (shard replica in > SolrCloud terminology) that does indexing is going to need one of these > ram buffers. > I will definitely investigate this ramBufferSizeMB. And, see through lucene code when a segment is flushed. Again, many thanks. Patrick