Hi Alxeandre, Number of segment counts are different but document counts are same. With (soft commit - 300 and hardcommit - 6000) = No. of segment - 43 AND With (soft commit - 60000 and hardcommit - 60000) = No. of segment - 31
I dont' have any idea related to segment counts. What is it? How to solve it? Any idea. Or it is fine without worrying about segments. Just want to ask - If segment counts are more than searching will be slow? On Wed, Mar 18, 2015 at 10:14 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Probably merged somewhat differently with some terms indexes repeating > between segments. Check the number of segments in data directory.And > do search for *:* and make sure both do have the same document counts. > > Also, In all these discussions, you still haven't answered about how > fast after indexing you want to _search_? Because, if you are not > actually searching while committing, you could even index on a > completely separate server (e.g. a faster one) and swap (or alias) > index in afterwards. Unless, of course, I missed it, it's a lot of > emails in a very short window of time. > > Regards, > Alex. > > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 18 March 2015 at 12:09, Nitin Solanki <nitinml...@gmail.com> wrote: > > When I kept my configuration to 300 for soft commit and 3000 for hard > > commit and indexed some amount of data, I got the data size of the whole > > index to be 6GB after completing the indexing. > > > > When I changed the configuration to 60000 for soft commit and 60000 for > > hard commit and indexed same data then I got the data size of the whole > > index to be 5GB after completing the indexing. > > > > But the number of documents in the both scenario were same. I am > wondering > > how that can be possible? > > > > On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinml...@gmail.com> > wrote: > > > >> Hi Erick, > >> I am just saying. I want to be sure on commits difference.. > >> What if I do frequent commits or not? And why I am saying that I need to > >> commit things so very quickly because I have to index 28GB of data which > >> takes 7-8 hours(frequent commits). > >> As you said, do commits after 60000 seconds then it will be more > expensive. > >> If I don't encounter with **"overlapping searchers" warning messages** > >> then I feel it seems to be okay. Is it? > >> > >> > >> > >> > >> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> Don't do it. Really, why do you want to do this? This seems like > >>> an "XY" problem, you haven't explained why you need to commit > >>> things so very quickly. > >>> > >>> I suspect you haven't tried _searching_ while committing at such > >>> a rate, and you might as well turn all your top-level caches off > >>> in solrconfig.xml since they won't be useful at all. > >>> > >>> Best, > >>> Erick > >>> > >>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <nitinml...@gmail.com> > >>> wrote: > >>> > Hi, > >>> > If I do very very fast indexing(softcommit = 300 and > hardcommit = > >>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) > as > >>> you > >>> > both said. Will fast indexing fail to index some data? > >>> > Any suggestion on this ? > >>> > > >>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar < > >>> > andyetitmo...@gmail.com> wrote: > >>> > > >>> >> Yes, and doing so is painful and takes lots of people and hardware > >>> >> resources to get there for large amounts of data and queries :) > >>> >> > >>> >> As Erick says, work backwards from 60s and first establish how high > the > >>> >> commit interval can be to satisfy your use case.. > >>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerick...@gmail.com> > >>> wrote: > >>> >> > >>> >> > First start by lengthening your soft and hard commit intervals > >>> >> > substantially. Start with 60000 and work backwards I'd say. > >>> >> > > >>> >> > Ramkumar has tuned the heck out of his installation to get the > commit > >>> >> > intervals to be that short ;). > >>> >> > > >>> >> > I'm betting that you'll see your RAM usage go way down, but that' > s a > >>> >> > guess until you test. > >>> >> > > >>> >> > Best, > >>> >> > Erick > >>> >> > > >>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki < > >>> nitinml...@gmail.com> > >>> >> > wrote: > >>> >> > > Hi Erick, > >>> >> > > You are saying correct. Something, **"overlapping > >>> >> searchers" > >>> >> > > warning messages** are coming in logs. > >>> >> > > **numDocs numbers** are changing when documents are adding at > the > >>> time > >>> >> of > >>> >> > > indexing. > >>> >> > > Any help? > >>> >> > > > >>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson < > >>> >> > erickerick...@gmail.com> > >>> >> > > wrote: > >>> >> > > > >>> >> > >> First, the soft commit interval is very short. Very, very, > very, > >>> very > >>> >> > >> short. 300ms is > >>> >> > >> just short of insane unless it's a typo ;). > >>> >> > >> > >>> >> > >> Here's a long background: > >>> >> > >> > >>> >> > >> > >>> >> > > >>> >> > >>> > https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > >>> >> > >> > >>> >> > >> But the short form is that you're opening searchers every 300 > ms. > >>> The > >>> >> > >> hard commit is better, > >>> >> > >> but every 3 seconds is still far too short IMO. I'd start with > >>> soft > >>> >> > >> commits of 60000 and hard > >>> >> > >> commits of 60000 (60 seconds), meaning that you're going to > have > >>> to > >>> >> > >> wait 1 minute for > >>> >> > >> docs to show up unless you explicitly commit. > >>> >> > >> > >>> >> > >> You're throwing away all the caches configured in > solrconfig.xml > >>> more > >>> >> > >> than 3 times a second, > >>> >> > >> executing autowarming, etc, etc, etc.... > >>> >> > >> > >>> >> > >> Changing these to longer intervals might cure the problem, but > if > >>> not > >>> >> > >> then, as Hoss would > >>> >> > >> say, "details matter". I suspect you're also seeing > "overlapping > >>> >> > >> searchers" warning messages > >>> >> > >> in your log, and it;s _possible_ that what's happening is that > >>> you're > >>> >> > >> just exceeding the > >>> >> > >> max warming searchers and never opening a new searcher with the > >>> >> > >> newly-indexed documents. > >>> >> > >> But that's a total shot in the dark. > >>> >> > >> > >>> >> > >> How are you looking for docs (and not finding them)? Does the > >>> numDocs > >>> >> > >> number in > >>> >> > >> the solr admin screen change? > >>> >> > >> > >>> >> > >> > >>> >> > >> Best, > >>> >> > >> Erick > >>> >> > >> > >>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki < > >>> nitinml...@gmail.com > >>> >> > > >>> >> > >> wrote: > >>> >> > >> > Hi Alexandre, > >>> >> > >> > > >>> >> > >> > > >>> >> > >> > *Hard Commit* is : > >>> >> > >> > > >>> >> > >> > <autoCommit> > >>> >> > >> > <maxTime>${solr.autoCommit.maxTime:3000}</maxTime> > >>> >> > >> > <openSearcher>false</openSearcher> > >>> >> > >> > </autoCommit> > >>> >> > >> > > >>> >> > >> > *Soft Commit* is : > >>> >> > >> > > >>> >> > >> > <autoSoftCommit> > >>> >> > >> > <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime> > >>> >> > >> > </autoSoftCommit> > >>> >> > >> > > >>> >> > >> > And I am committing 20000 documents each time. > >>> >> > >> > Is it good config for committing? > >>> >> > >> > Or I am good something wrong ? > >>> >> > >> > > >>> >> > >> > > >>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch < > >>> >> > >> arafa...@gmail.com> > >>> >> > >> > wrote: > >>> >> > >> > > >>> >> > >> >> What's your commit strategy? Explicit commits? Soft > >>> commits/hard > >>> >> > >> >> commits (in solrconfig.xml)? > >>> >> > >> >> > >>> >> > >> >> Regards, > >>> >> > >> >> Alex. > >>> >> > >> >> ---- > >>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a > >>> newsletter: > >>> >> > >> >> http://www.solr-start.com/ > >>> >> > >> >> > >>> >> > >> >> > >>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki < > nitinml...@gmail.com > >>> > > >>> >> > wrote: > >>> >> > >> >> > Hello, > >>> >> > >> >> > I have written a python script to do 20000 > >>> documents > >>> >> > >> indexing > >>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU. > >>> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed. > >>> While > >>> >> > >> >> indexing, > >>> >> > >> >> > all RAM is consumed but **not** a single document is > >>> indexed. Why > >>> >> > so? > >>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service > >>> Unavailable* > >>> >> in > >>> >> > >> python > >>> >> > >> >> > script. > >>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all > >>> nodes > >>> >> > went > >>> >> > >> >> down. > >>> >> > >> >> > I am not sure about that. Any help please.. > >>> >> > >> >> > Or anything else is happening.. > >>> >> > >> >> > And how to overcome this issue. > >>> >> > >> >> > Please assist me towards right path. > >>> >> > >> >> > Thanks.. > >>> >> > >> >> > > >>> >> > >> >> > Warm Regards, > >>> >> > >> >> > Nitin Solanki > >>> >> > >> >> > >>> >> > >> > >>> >> > > >>> >> > >>> > >> > >> >