On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> That or even hard commit to 60 seconds. It's strictly a matter of how often > you want to close old segments and open new ones. > > On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <nitinml...@gmail.com> > wrote: > > Hi Erick.. > > I read your Article. Really nice... > > Inside that you said that for bulk indexing. Set soft commit = 10 mins > and > > hard commit = 15sec. Is it also okay for my scenario? > > > > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> bq: As you said, do commits after 60000 seconds > >> > >> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_ > >> as Shawn said. So setting it to 60000 is every minute. > >> > >> From solrconfig.xml, conveniently located immediately above the > >> <autoCommit> tag: > >> > >> maxTime - Maximum amount of time in ms that is allowed to pass since a > >> document was added before automatically triggering a new commit. > >> > >> Also, a lot of answers to soft and hard commits is here as I pointed > >> out before, did you read it? > >> > >> > >> > https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > >> > >> Best > >> Erick > >> > >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch > >> <arafa...@gmail.com> wrote: > >> > Probably merged somewhat differently with some terms indexes repeating > >> > between segments. Check the number of segments in data directory.And > >> > do search for *:* and make sure both do have the same document counts. > >> > > >> > Also, In all these discussions, you still haven't answered about how > >> > fast after indexing you want to _search_? Because, if you are not > >> > actually searching while committing, you could even index on a > >> > completely separate server (e.g. a faster one) and swap (or alias) > >> > index in afterwards. Unless, of course, I missed it, it's a lot of > >> > emails in a very short window of time. > >> > > >> > Regards, > >> > Alex. > >> > > >> > ---- > >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > >> > http://www.solr-start.com/ > >> > > >> > > >> > On 18 March 2015 at 12:09, Nitin Solanki <nitinml...@gmail.com> > wrote: > >> >> When I kept my configuration to 300 for soft commit and 3000 for hard > >> >> commit and indexed some amount of data, I got the data size of the > whole > >> >> index to be 6GB after completing the indexing. > >> >> > >> >> When I changed the configuration to 60000 for soft commit and 60000 > for > >> >> hard commit and indexed same data then I got the data size of the > whole > >> >> index to be 5GB after completing the indexing. > >> >> > >> >> But the number of documents in the both scenario were same. I am > >> wondering > >> >> how that can be possible? > >> >> > >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinml...@gmail.com > > > >> wrote: > >> >> > >> >>> Hi Erick, > >> >>> I am just saying. I want to be sure on commits > >> difference.. > >> >>> What if I do frequent commits or not? And why I am saying that I > need > >> to > >> >>> commit things so very quickly because I have to index 28GB of data > >> which > >> >>> takes 7-8 hours(frequent commits). > >> >>> As you said, do commits after 60000 seconds then it will be more > >> expensive. > >> >>> If I don't encounter with **"overlapping searchers" warning > messages** > >> >>> then I feel it seems to be okay. Is it? > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson < > >> erickerick...@gmail.com> > >> >>> wrote: > >> >>> > >> >>>> Don't do it. Really, why do you want to do this? This seems like > >> >>>> an "XY" problem, you haven't explained why you need to commit > >> >>>> things so very quickly. > >> >>>> > >> >>>> I suspect you haven't tried _searching_ while committing at such > >> >>>> a rate, and you might as well turn all your top-level caches off > >> >>>> in solrconfig.xml since they won't be useful at all. > >> >>>> > >> >>>> Best, > >> >>>> Erick > >> >>>> > >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki < > nitinml...@gmail.com> > >> >>>> wrote: > >> >>>> > Hi, > >> >>>> > If I do very very fast indexing(softcommit = 300 and > >> hardcommit = > >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = > 60000) > >> as > >> >>>> you > >> >>>> > both said. Will fast indexing fail to index some data? > >> >>>> > Any suggestion on this ? > >> >>>> > > >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar < > >> >>>> > andyetitmo...@gmail.com> wrote: > >> >>>> > > >> >>>> >> Yes, and doing so is painful and takes lots of people and > hardware > >> >>>> >> resources to get there for large amounts of data and queries :) > >> >>>> >> > >> >>>> >> As Erick says, work backwards from 60s and first establish how > >> high the > >> >>>> >> commit interval can be to satisfy your use case.. > >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerick...@gmail.com > > > >> >>>> wrote: > >> >>>> >> > >> >>>> >> > First start by lengthening your soft and hard commit intervals > >> >>>> >> > substantially. Start with 60000 and work backwards I'd say. > >> >>>> >> > > >> >>>> >> > Ramkumar has tuned the heck out of his installation to get the > >> commit > >> >>>> >> > intervals to be that short ;). > >> >>>> >> > > >> >>>> >> > I'm betting that you'll see your RAM usage go way down, but > >> that' s a > >> >>>> >> > guess until you test. > >> >>>> >> > > >> >>>> >> > Best, > >> >>>> >> > Erick > >> >>>> >> > > >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki < > >> >>>> nitinml...@gmail.com> > >> >>>> >> > wrote: > >> >>>> >> > > Hi Erick, > >> >>>> >> > > You are saying correct. Something, > **"overlapping > >> >>>> >> searchers" > >> >>>> >> > > warning messages** are coming in logs. > >> >>>> >> > > **numDocs numbers** are changing when documents are adding > at > >> the > >> >>>> time > >> >>>> >> of > >> >>>> >> > > indexing. > >> >>>> >> > > Any help? > >> >>>> >> > > > >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson < > >> >>>> >> > erickerick...@gmail.com> > >> >>>> >> > > wrote: > >> >>>> >> > > > >> >>>> >> > >> First, the soft commit interval is very short. Very, very, > >> very, > >> >>>> very > >> >>>> >> > >> short. 300ms is > >> >>>> >> > >> just short of insane unless it's a typo ;). > >> >>>> >> > >> > >> >>>> >> > >> Here's a long background: > >> >>>> >> > >> > >> >>>> >> > >> > >> >>>> >> > > >> >>>> >> > >> >>>> > >> > https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > >> >>>> >> > >> > >> >>>> >> > >> But the short form is that you're opening searchers every > 300 > >> ms. > >> >>>> The > >> >>>> >> > >> hard commit is better, > >> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start > with > >> >>>> soft > >> >>>> >> > >> commits of 60000 and hard > >> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to > >> have > >> >>>> to > >> >>>> >> > >> wait 1 minute for > >> >>>> >> > >> docs to show up unless you explicitly commit. > >> >>>> >> > >> > >> >>>> >> > >> You're throwing away all the caches configured in > >> solrconfig.xml > >> >>>> more > >> >>>> >> > >> than 3 times a second, > >> >>>> >> > >> executing autowarming, etc, etc, etc.... > >> >>>> >> > >> > >> >>>> >> > >> Changing these to longer intervals might cure the problem, > >> but if > >> >>>> not > >> >>>> >> > >> then, as Hoss would > >> >>>> >> > >> say, "details matter". I suspect you're also seeing > >> "overlapping > >> >>>> >> > >> searchers" warning messages > >> >>>> >> > >> in your log, and it;s _possible_ that what's happening is > that > >> >>>> you're > >> >>>> >> > >> just exceeding the > >> >>>> >> > >> max warming searchers and never opening a new searcher with > >> the > >> >>>> >> > >> newly-indexed documents. > >> >>>> >> > >> But that's a total shot in the dark. > >> >>>> >> > >> > >> >>>> >> > >> How are you looking for docs (and not finding them)? Does > the > >> >>>> numDocs > >> >>>> >> > >> number in > >> >>>> >> > >> the solr admin screen change? > >> >>>> >> > >> > >> >>>> >> > >> > >> >>>> >> > >> Best, > >> >>>> >> > >> Erick > >> >>>> >> > >> > >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki < > >> >>>> nitinml...@gmail.com > >> >>>> >> > > >> >>>> >> > >> wrote: > >> >>>> >> > >> > Hi Alexandre, > >> >>>> >> > >> > > >> >>>> >> > >> > > >> >>>> >> > >> > *Hard Commit* is : > >> >>>> >> > >> > > >> >>>> >> > >> > <autoCommit> > >> >>>> >> > >> > <maxTime>${solr.autoCommit.maxTime:3000}</maxTime> > >> >>>> >> > >> > <openSearcher>false</openSearcher> > >> >>>> >> > >> > </autoCommit> > >> >>>> >> > >> > > >> >>>> >> > >> > *Soft Commit* is : > >> >>>> >> > >> > > >> >>>> >> > >> > <autoSoftCommit> > >> >>>> >> > >> > <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime> > >> >>>> >> > >> > </autoSoftCommit> > >> >>>> >> > >> > > >> >>>> >> > >> > And I am committing 20000 documents each time. > >> >>>> >> > >> > Is it good config for committing? > >> >>>> >> > >> > Or I am good something wrong ? > >> >>>> >> > >> > > >> >>>> >> > >> > > >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch < > >> >>>> >> > >> arafa...@gmail.com> > >> >>>> >> > >> > wrote: > >> >>>> >> > >> > > >> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft > >> >>>> commits/hard > >> >>>> >> > >> >> commits (in solrconfig.xml)? > >> >>>> >> > >> >> > >> >>>> >> > >> >> Regards, > >> >>>> >> > >> >> Alex. > >> >>>> >> > >> >> ---- > >> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a > >> >>>> newsletter: > >> >>>> >> > >> >> http://www.solr-start.com/ > >> >>>> >> > >> >> > >> >>>> >> > >> >> > >> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki < > >> nitinml...@gmail.com > >> >>>> > > >> >>>> >> > wrote: > >> >>>> >> > >> >> > Hello, > >> >>>> >> > >> >> > I have written a python script to do 20000 > >> >>>> documents > >> >>>> >> > >> indexing > >> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU. > >> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was > >> freed. > >> >>>> While > >> >>>> >> > >> >> indexing, > >> >>>> >> > >> >> > all RAM is consumed but **not** a single document is > >> >>>> indexed. Why > >> >>>> >> > so? > >> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service > >> >>>> Unavailable* > >> >>>> >> in > >> >>>> >> > >> python > >> >>>> >> > >> >> > script. > >> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which > all > >> >>>> nodes > >> >>>> >> > went > >> >>>> >> > >> >> down. > >> >>>> >> > >> >> > I am not sure about that. Any help please.. > >> >>>> >> > >> >> > Or anything else is happening.. > >> >>>> >> > >> >> > And how to overcome this issue. > >> >>>> >> > >> >> > Please assist me towards right path. > >> >>>> >> > >> >> > Thanks.. > >> >>>> >> > >> >> > > >> >>>> >> > >> >> > Warm Regards, > >> >>>> >> > >> >> > Nitin Solanki > >> >>>> >> > >> >> > >> >>>> >> > >> > >> >>>> >> > > >> >>>> >> > >> >>>> > >> >>> > >> >>> > >> >