Re: High disk write usage

Erick Erickson Wed, 05 Jul 2017 10:04:56 -0700

bq: We have enough physical RAM to store full collection and 16Gb for each JVM.


That's not quite what I was asking for. Lucene uses MMapDirectory to
map part of the index into the OS memory space. If you've
over-allocated the JVM space relative to your physical memory that
space can start swapping. Frankly I'd expect your query performance to
die if that was happening so this is a sanity check.

How much physical memory does the machine have and how much memory is
allocated to _all_ of the JVMs running on that machine?

see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick


On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com> wrote:
> Hi Erik! thanks for your response!
>
> Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.
>
>
> We have enough physical RAM to store full collection and 16Gb for each
> JVM.  The collection is relatively small.
>
> I've tried (for testing purposes)  disabling transactionlog (commenting
> <updateLog>)... but cluster does not go up. I'll try writing into separated
> drive, nice idea...
>
>
>
>
>
>
>
>
> 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
>
>> What is your soft commit interval? That'll cause I/O as well.
>>
>> How much physical RAM and how much is dedicated to _all_ the JVMs on a
>> machine? One cause here is that Lucene uses MMapDirectory which can be
>> starved for OS memory if you use too much JVM, my rule of thumb is
>> that _at least_ half of the physical memory should be reserved for the
>> OS.
>>
>> Your transaction logs should fluctuate but even out. By that I mean
>> they should increase in size but every hard commit should truncate
>> some of them so I wouldn't expect them to grow indefinitely.
>>
>> One strategy is to put your tlogs on a separate drive exactly to
>> reduce contention. You could disable them too at a cost of risking
>> your data. That might be a quick experiment you could run though,
>> disable tlogs and see what that changes. Of course I'd do this on my
>> test system ;).
>>
>> But yeah, Solr will use a lot of I/O in the scenario you are outlining
>> I'm afraid.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com>
>> wrote:
>> > thanks Markus!
>> >
>> > We already have SSD.
>> >
>> > About changing topology.... we probed yesterday with 10 shards, but
>> system
>> > goes more inconsistent than with the current topology (5x10). I dont know
>> > why... too many traffic perhaps?
>> >
>> > About merge factor.. we set default configuration for some days... but
>> when
>> > a merge occurs system overload. We probed with mergefactor of 4 to
>> improbe
>> > query times and trying to have smaller merges.
>> >
>> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:
>> >
>> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
>> you
>> >> got an extreme case, either create more shards and consider better
>> hardware
>> >> (SSD's)
>> >>
>> >> -----Original message-----
>> >> > From:Antonio De Miguel <deveto...@gmail.com>
>> >> > Sent: Wednesday 5th July 2017 16:48
>> >> > To: solr-user@lucene.apache.org
>> >> > Subject: Re: High disk write usage
>> >> >
>> >> > Thnaks a lot alessandro!
>> >> >
>> >> > Yes, we have very big physical dedicated machines, with a topology of
>> 5
>> >> > shards and10 replicas each shard.
>> >> >
>> >> >
>> >> > 1. transaction log files are increasing but not with this rate
>> >> >
>> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
>> >> > visible results
>> >> >
>> >> > 3.  We don't use those features
>> >> >
>> >> > 4. No.
>> >> >
>> >> > 5. I've probed with low and high mergefacors and i think that is  the
>> >> point.
>> >> >
>> >> > With low merge factor (over 4) we 've high write disk rate as i said
>> >> > previously
>> >> >
>> >> > with merge factor of 20, writing disk rate is decreasing, but now,
>> with
>> >> > high qps rates (over 1000 qps) system is overloaded.
>> >> >
>> >> > i think that's the expected behaviour :(
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io
>> >:
>> >> >
>> >> > > Point 2 was the ram Buffer size :
>> >> > >
>> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>> >> > >          indexing for buffering added documents and deletions before
>> >> they
>> >> > > are
>> >> > >          flushed to the Directory.
>> >> > >          maxBufferedDocs sets a limit on the number of documents
>> >> buffered
>> >> > >          before flushing.
>> >> > >          If both ramBufferSizeMB and maxBufferedDocs is set, then
>> >> > >          Lucene will flush based on whichever limit is hit first.
>> >> > >
>> >> > > <ramBufferSizeMB>100</ramBufferSizeMB>
>> >> > > <maxBufferedDocs>1000</maxBufferedDocs>
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > -----
>> >> > > ---------------
>> >> > > Alessandro Benedetti
>> >> > > Search Consultant, R&D Software Engineer, Director
>> >> > > Sease Ltd. - www.sease.io
>> >> > > --
>> >> > > View this message in context: http://lucene.472066.n3.
>> >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
>> >> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> >> > >
>> >> >
>> >>
>>

Re: High disk write usage

Reply via email to