We already reduced the -Xss256k. How could we reduce the size of the transaction log? By less autoCommits? Or could it be cleaned up?
Thanks Christoph -----Ursprüngliche Nachricht----- Von: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Gesendet: Sonntag, 31. August 2014 20:12 An: solr-user@lucene.apache.org; Mark Miller Betreff: Re: Scaling to large Number of Collections Yeah, I second Mark's suggestion on reducing the stack size. The default on modern 64-bit boxes is usually 1024KB which adds up to a lot when you're running 5000 cores (5000 * 2 = 10000MB). I think the zk register thread can be pooled together but the search threads can't be because we'd run into deadlock situations. I'll have to think more on that. As for your 2nd question on slow restarts - make sure you tune autoCommit settings so that your transaction logs don't get so big. When people complain about slow restarts, large transaction logs are usually the culprit. As for the larger questions about lot of collections, yeah, I think you'll see more work happening in that direction. We, at Lucidworks, have been spending quite a bit of time making it work well with SolrCloud. On Sun, Aug 31, 2014 at 9:39 AM, Mark Miller <markrmil...@gmail.com> wrote: > > > > so you might still end up with these out of threads issue again. > > > You can also generally drop the stack size (Xss) quite a bit to to > handle more threads. > > Beyond that, there are some thread pools you can configure. However, > until we fix the distrib deadlock issue, you don't want to drop the > container thread pool too much. There are other control points though. > > - Mark > http://about.me/markrmiller > > > On Sun, Aug 31, 2014 at 11:53 AM, Ramkumar R. Aiyengar < > andyetitmo...@gmail.com> wrote: > > > On 31 Aug 2014 13:24, "Mark Miller" <markrmil...@gmail.com> wrote: > > > > > > > > > > On Aug 31, 2014, at 4:04 AM, Christoph Schmidt < > > christoph.schm...@moresophy.de> wrote: > > > > > > > > we see at least two problems when scaling to large number of > > collections. I would like to ask the community, if they are known > > and > maybe > > already addressed in development: > > > > We have a SolrCloud running with the following numbers: > > > > - 5 Servers (each 24 CPUs, 128 RAM) > > > > - 13.000 Collection with 25.000 SolrCores in the Cloud > > > > The Cloud is working fine, but we see two problems, if we like > > > > to > scale > > further > > > > 1. Resource consumption of native system threads > > > > We see that each collection opens at least two threads: one for > > > > the > > zookeeper (coreZkRegister-1-thread-5154) and one for the searcher > > (searcherExecutor-28357-thread-1) > > > > We will run in "OutOfMemoryError: unable to create new native > thread". > > Maybe the architecture could be changed here to use thread pools? > > > > 2. The shutdown and the startup of one server in the SolrCloud > > takes 2 hours. So a rolling start is about 10h. For me the problem > > seems > to > > be that leader election is "linear". The Overseer does core per > > core. The organisation of the cloud is not done parallel or > > distributed. Is this already addressed by > > https://issues.apache.org/jira/browse/SOLR-5473 or > is > > there more needed? > > > > > > 2. No, but it should have been fixed by another issue that will be > > > in > > 4.10. > > > > Note however that this fix will result in even more temporary thread > usage > > as all leadership elections will happen in parallel, so you might > > still > end > > up with these out of threads issue again. > > > > Quite possibly the out of threads issue is just some system soft > > limit which is kicking in. Linux certainly has a limit you can > > configure > through > > sysctl, your OS, whatever that might be, probably does the same. May > > be worth exploring if you can bump that up. > > > > > > > > - Mark > > > http://about.me/markrmiller > -- Regards, Shalin Shekhar Mangar.