Hi Erick, I read mergeFactor Policy for indexing. By default, mergerFactor is 10. As said in document,
High value merge factor (e.g., 25): - Pro: Generally improves indexing speed - Con: Less frequent merges, resulting in a collection with more index files which may slow searching Low value merge factor (e.g., 2): - Pro: Smaller number of index files, which speeds up searching. - Con: More segment merges slow down indexing. So, My main purpose is **searching**. Searching must be fast. Therefore, If I set the value of **mergeFactor = 2 ** then indexing will be slow but searching may fast right. Once Again, I will tell. I am indexing(Total data size - 28GB) 20000 document at a time that encounter commits after 15 seconds(hard commit) and 10 mins(soft commit). Is searching be fast, if I set **mergeFactor = 2 ** and what should be the value for ramBufferSizeMB, maxBufferedDocs, maxIndexingThreads? Right now, All value are set by default.. On Fri, Mar 20, 2015 at 11:42 AM, Nitin Solanki <nitinml...@gmail.com> wrote: > > > On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> That or even hard commit to 60 seconds. It's strictly a matter of how >> often >> you want to close old segments and open new ones. >> >> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <nitinml...@gmail.com> >> wrote: >> > Hi Erick.. >> > I read your Article. Really nice... >> > Inside that you said that for bulk indexing. Set soft commit = 10 mins >> and >> > hard commit = 15sec. Is it also okay for my scenario? >> > >> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> bq: As you said, do commits after 60000 seconds >> >> >> >> No, No, No. I'm NOT saying 60000 seconds! That time is in >> _milliseconds_ >> >> as Shawn said. So setting it to 60000 is every minute. >> >> >> >> From solrconfig.xml, conveniently located immediately above the >> >> <autoCommit> tag: >> >> >> >> maxTime - Maximum amount of time in ms that is allowed to pass since a >> >> document was added before automatically triggering a new commit. >> >> >> >> Also, a lot of answers to soft and hard commits is here as I pointed >> >> out before, did you read it? >> >> >> >> >> >> >> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> >> >> Best >> >> Erick >> >> >> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch >> >> <arafa...@gmail.com> wrote: >> >> > Probably merged somewhat differently with some terms indexes >> repeating >> >> > between segments. Check the number of segments in data directory.And >> >> > do search for *:* and make sure both do have the same document >> counts. >> >> > >> >> > Also, In all these discussions, you still haven't answered about how >> >> > fast after indexing you want to _search_? Because, if you are not >> >> > actually searching while committing, you could even index on a >> >> > completely separate server (e.g. a faster one) and swap (or alias) >> >> > index in afterwards. Unless, of course, I missed it, it's a lot of >> >> > emails in a very short window of time. >> >> > >> >> > Regards, >> >> > Alex. >> >> > >> >> > ---- >> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> >> > http://www.solr-start.com/ >> >> > >> >> > >> >> > On 18 March 2015 at 12:09, Nitin Solanki <nitinml...@gmail.com> >> wrote: >> >> >> When I kept my configuration to 300 for soft commit and 3000 for >> hard >> >> >> commit and indexed some amount of data, I got the data size of the >> whole >> >> >> index to be 6GB after completing the indexing. >> >> >> >> >> >> When I changed the configuration to 60000 for soft commit and 60000 >> for >> >> >> hard commit and indexed same data then I got the data size of the >> whole >> >> >> index to be 5GB after completing the indexing. >> >> >> >> >> >> But the number of documents in the both scenario were same. I am >> >> wondering >> >> >> how that can be possible? >> >> >> >> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki < >> nitinml...@gmail.com> >> >> wrote: >> >> >> >> >> >>> Hi Erick, >> >> >>> I am just saying. I want to be sure on commits >> >> difference.. >> >> >>> What if I do frequent commits or not? And why I am saying that I >> need >> >> to >> >> >>> commit things so very quickly because I have to index 28GB of data >> >> which >> >> >>> takes 7-8 hours(frequent commits). >> >> >>> As you said, do commits after 60000 seconds then it will be more >> >> expensive. >> >> >>> If I don't encounter with **"overlapping searchers" warning >> messages** >> >> >>> then I feel it seems to be okay. Is it? >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson < >> >> erickerick...@gmail.com> >> >> >>> wrote: >> >> >>> >> >> >>>> Don't do it. Really, why do you want to do this? This seems like >> >> >>>> an "XY" problem, you haven't explained why you need to commit >> >> >>>> things so very quickly. >> >> >>>> >> >> >>>> I suspect you haven't tried _searching_ while committing at such >> >> >>>> a rate, and you might as well turn all your top-level caches off >> >> >>>> in solrconfig.xml since they won't be useful at all. >> >> >>>> >> >> >>>> Best, >> >> >>>> Erick >> >> >>>> >> >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki < >> nitinml...@gmail.com> >> >> >>>> wrote: >> >> >>>> > Hi, >> >> >>>> > If I do very very fast indexing(softcommit = 300 and >> >> hardcommit = >> >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = >> 60000) >> >> as >> >> >>>> you >> >> >>>> > both said. Will fast indexing fail to index some data? >> >> >>>> > Any suggestion on this ? >> >> >>>> > >> >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar < >> >> >>>> > andyetitmo...@gmail.com> wrote: >> >> >>>> > >> >> >>>> >> Yes, and doing so is painful and takes lots of people and >> hardware >> >> >>>> >> resources to get there for large amounts of data and queries :) >> >> >>>> >> >> >> >>>> >> As Erick says, work backwards from 60s and first establish how >> >> high the >> >> >>>> >> commit interval can be to satisfy your use case.. >> >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" < >> erickerick...@gmail.com> >> >> >>>> wrote: >> >> >>>> >> >> >> >>>> >> > First start by lengthening your soft and hard commit >> intervals >> >> >>>> >> > substantially. Start with 60000 and work backwards I'd say. >> >> >>>> >> > >> >> >>>> >> > Ramkumar has tuned the heck out of his installation to get >> the >> >> commit >> >> >>>> >> > intervals to be that short ;). >> >> >>>> >> > >> >> >>>> >> > I'm betting that you'll see your RAM usage go way down, but >> >> that' s a >> >> >>>> >> > guess until you test. >> >> >>>> >> > >> >> >>>> >> > Best, >> >> >>>> >> > Erick >> >> >>>> >> > >> >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki < >> >> >>>> nitinml...@gmail.com> >> >> >>>> >> > wrote: >> >> >>>> >> > > Hi Erick, >> >> >>>> >> > > You are saying correct. Something, >> **"overlapping >> >> >>>> >> searchers" >> >> >>>> >> > > warning messages** are coming in logs. >> >> >>>> >> > > **numDocs numbers** are changing when documents are adding >> at >> >> the >> >> >>>> time >> >> >>>> >> of >> >> >>>> >> > > indexing. >> >> >>>> >> > > Any help? >> >> >>>> >> > > >> >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson < >> >> >>>> >> > erickerick...@gmail.com> >> >> >>>> >> > > wrote: >> >> >>>> >> > > >> >> >>>> >> > >> First, the soft commit interval is very short. Very, very, >> >> very, >> >> >>>> very >> >> >>>> >> > >> short. 300ms is >> >> >>>> >> > >> just short of insane unless it's a typo ;). >> >> >>>> >> > >> >> >> >>>> >> > >> Here's a long background: >> >> >>>> >> > >> >> >> >>>> >> > >> >> >> >>>> >> > >> >> >>>> >> >> >> >>>> >> >> >> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> >>>> >> > >> >> >> >>>> >> > >> But the short form is that you're opening searchers every >> 300 >> >> ms. >> >> >>>> The >> >> >>>> >> > >> hard commit is better, >> >> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start >> with >> >> >>>> soft >> >> >>>> >> > >> commits of 60000 and hard >> >> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going >> to >> >> have >> >> >>>> to >> >> >>>> >> > >> wait 1 minute for >> >> >>>> >> > >> docs to show up unless you explicitly commit. >> >> >>>> >> > >> >> >> >>>> >> > >> You're throwing away all the caches configured in >> >> solrconfig.xml >> >> >>>> more >> >> >>>> >> > >> than 3 times a second, >> >> >>>> >> > >> executing autowarming, etc, etc, etc.... >> >> >>>> >> > >> >> >> >>>> >> > >> Changing these to longer intervals might cure the problem, >> >> but if >> >> >>>> not >> >> >>>> >> > >> then, as Hoss would >> >> >>>> >> > >> say, "details matter". I suspect you're also seeing >> >> "overlapping >> >> >>>> >> > >> searchers" warning messages >> >> >>>> >> > >> in your log, and it;s _possible_ that what's happening is >> that >> >> >>>> you're >> >> >>>> >> > >> just exceeding the >> >> >>>> >> > >> max warming searchers and never opening a new searcher >> with >> >> the >> >> >>>> >> > >> newly-indexed documents. >> >> >>>> >> > >> But that's a total shot in the dark. >> >> >>>> >> > >> >> >> >>>> >> > >> How are you looking for docs (and not finding them)? Does >> the >> >> >>>> numDocs >> >> >>>> >> > >> number in >> >> >>>> >> > >> the solr admin screen change? >> >> >>>> >> > >> >> >> >>>> >> > >> >> >> >>>> >> > >> Best, >> >> >>>> >> > >> Erick >> >> >>>> >> > >> >> >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki < >> >> >>>> nitinml...@gmail.com >> >> >>>> >> > >> >> >>>> >> > >> wrote: >> >> >>>> >> > >> > Hi Alexandre, >> >> >>>> >> > >> > >> >> >>>> >> > >> > >> >> >>>> >> > >> > *Hard Commit* is : >> >> >>>> >> > >> > >> >> >>>> >> > >> > <autoCommit> >> >> >>>> >> > >> > >> <maxTime>${solr.autoCommit.maxTime:3000}</maxTime> >> >> >>>> >> > >> > <openSearcher>false</openSearcher> >> >> >>>> >> > >> > </autoCommit> >> >> >>>> >> > >> > >> >> >>>> >> > >> > *Soft Commit* is : >> >> >>>> >> > >> > >> >> >>>> >> > >> > <autoSoftCommit> >> >> >>>> >> > >> > >> <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime> >> >> >>>> >> > >> > </autoSoftCommit> >> >> >>>> >> > >> > >> >> >>>> >> > >> > And I am committing 20000 documents each time. >> >> >>>> >> > >> > Is it good config for committing? >> >> >>>> >> > >> > Or I am good something wrong ? >> >> >>>> >> > >> > >> >> >>>> >> > >> > >> >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch < >> >> >>>> >> > >> arafa...@gmail.com> >> >> >>>> >> > >> > wrote: >> >> >>>> >> > >> > >> >> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft >> >> >>>> commits/hard >> >> >>>> >> > >> >> commits (in solrconfig.xml)? >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> Regards, >> >> >>>> >> > >> >> Alex. >> >> >>>> >> > >> >> ---- >> >> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a >> >> >>>> newsletter: >> >> >>>> >> > >> >> http://www.solr-start.com/ >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki < >> >> nitinml...@gmail.com >> >> >>>> > >> >> >>>> >> > wrote: >> >> >>>> >> > >> >> > Hello, >> >> >>>> >> > >> >> > I have written a python script to do 20000 >> >> >>>> documents >> >> >>>> >> > >> indexing >> >> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU. >> >> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was >> >> freed. >> >> >>>> While >> >> >>>> >> > >> >> indexing, >> >> >>>> >> > >> >> > all RAM is consumed but **not** a single document is >> >> >>>> indexed. Why >> >> >>>> >> > so? >> >> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service >> >> >>>> Unavailable* >> >> >>>> >> in >> >> >>>> >> > >> python >> >> >>>> >> > >> >> > script. >> >> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by >> which all >> >> >>>> nodes >> >> >>>> >> > went >> >> >>>> >> > >> >> down. >> >> >>>> >> > >> >> > I am not sure about that. Any help please.. >> >> >>>> >> > >> >> > Or anything else is happening.. >> >> >>>> >> > >> >> > And how to overcome this issue. >> >> >>>> >> > >> >> > Please assist me towards right path. >> >> >>>> >> > >> >> > Thanks.. >> >> >>>> >> > >> >> > >> >> >>>> >> > >> >> > Warm Regards, >> >> >>>> >> > >> >> > Nitin Solanki >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >>>> >> > >> >> >>>> >> >> >> >>>> >> >> >>> >> >> >>> >> >> >> > >