I assume that the lotOfCores feature doesn't use zookeeper I tried simulate the cores as collection, but when the size of clusterstate.json is bigger than 1M and -Djute.maxbuffer is needed to increase the 1 mega limitation.
A naive question, why clusterstate.json is doesn't by collection? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, October 7, 2013 at 1:33 PM, Erick Erickson wrote: > Thanks for the great writeup! It's always interesting to see how > a feature plays out "in the real world". A couple of questions > though: > > bq: We added 2 Cores options : > Do you mean you patched Solr? If so are you willing to shard the code > back? If both are "yes", please open a JIRA, attach the patch and assign > it to me. > > bq: the number of file descriptors, it used a lot (need to increase global > max and per process fd) > > Right, this makes sense since you have a bunch of cores all with their > own descriptors open. I'm assuming that you hit a rather high max > number and it stays pretty steady.... > > bq: the overhead to parse solrconfig.xml and load dependencies to open > each core > > Right, I tried to look at sharing the underlying solrconfig object but > it seemed pretty hairy. There are some extensive comments in the > JIRA of the problems I foresaw. There may be some action on this > in the future. > > bq: lotsOfCores doesn’t work with SolrCloud > > Right, we haven't concentrated on that, it's an interesting problem. > In particular it's not clear what happens when nodes go up/down, > replicate, resynch, all that. > > bq: When you start, it spend a lot of times to discover cores due to a big > > How long? I tried 15K cores on my laptop and I think I was getting 15 > second delays or roughly 1K cores discovered/second. Is your delay > on the order of 50 seconds with 50K cores? > > I'm not sure how you could do that in the background, but I haven't > thought about it much. I tried multi-threading core discovery and that > didn't help (SSD disk), I assumed that the problem was mostly I/O > contention (but didn't prove it). What if a request came in for a core > before you'd found it? I'm not sure what the right behavior would be > except perhaps to block on that request until core discovery was > complete. Hmmmmm. How would that work for your case? That > seems do-able. > > BTW, so far you get the prize for the most cores on a node I think. > > Thanks again for the great feedback! > > Erick > > On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier > <olivier.so...@worldline.com (mailto:olivier.so...@worldline.com)> wrote: > > Hello, > > > > In my company, we use Solr in production to offer full text search on > > mailboxes. > > We host dozens million of mailboxes, but only webmail users have such > > feature (few millions). > > We have the following use case : > > - non static indexes with more update (indexing and deleting), than > > select requests (ratio 7:1) > > - homogeneous configuration for all indexes > > - not so much user at the same time > > > > We started to index mailboxes with Solr 1.4 in 2010, on a subset of > > 400,000 users. > > - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr > > instance > > - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per > > index (~2 million users) > > - we upgraded to Solr 3.5 in 2012 > > As indexes grew, IOPS and the response times have increased more and more. > > > > The index size was mainly due to stored fields (large .fdt files) > > Retrieving these fields from the index was costly, because of many seek > > in large files, and no limit usage possible. > > There is also an overhead on queries : too many results are filtered to > > find only results concerning user. > > For these reason and others, like not pooled users, hardware savings, > > better scoring, some requests that do not support filtering, we have > > decided to use the LotsOfCores feature. > > > > Our goal was to change the current I/O usage : from lots of random I/O > > access on huge segments to mostly sequential I/O access on small segments. > > For our use case, it's not a big deal, that the first query to one not > > yet loaded core will be slow. > > And, we don’t need to fit all the cores into memory at once. > > > > We started from the SOLR-1293 issue and the LotsOfCores wiki page to > > finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 > > core). > > We don't need anymore to run so many Solr per node. We are now able to > > have around 50000 cores per Solr and we plan to grow to 100,000 cores > > per instance. > > In a first time, we used the solr.xml persistence. All cores have > > loadOnStartup="false" and transient="true" attributes, so a cold start > > is very quick. The response times were better than ever, in comparaison > > with poor response times, we had before using LotsOfCores. > > > > We added 2 Cores options : > > - "numBuckets" to create a subdirectory based on a hash on the corename > > % numBuckets in the core Datadir, because all cores cannot live in the > > same directory > > - "Auto" with 3 differents values : > > 1) false : default behaviour > > 2) createLoad : create, if not exist, and load the core on the fly on > > the first incoming request (update, select). > > 3) onlyLoad : load the core on the fly on the first incoming request > > (update, select), if exist on disk > > > > Then, to improve performance and avoid synchronization in the solr.xml > > persistence : we disabled it. > > The drawback is we cannot see anymore all the availables cores list with > > the admin core status command, only those warmed up. > > Finally, we can achieve very good performances with Solr LotsOfCores : > > - Index 5 emails (avg) + commit + search : x4.9 faster response time > > (Mean), x5.4 faster (95th per) > > - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4 > > faster (95th per) > > - Search : x3.7 faster response time (Mean) 4x faster (95th per) > > > > In fact, the better performance is mainly due to the little size of each > > index, but also thanks to the isolation between cores (updates and > > queries on many mailboxes don’t have side effects to each other). > > One important thing with the LotsOfCores feature is to take care of : > > - the number of file descriptors, it used a lot (need to increase global > > max and per process fd) > > - the value of the transientCacheSize depending of the RAM size and the > > PermGen allocated size > > - the leak of ClassLoader that increase minor GC times, when CMS GC is > > enabled (use -XX:+CMSClassUnloadingEnabled) > > - the overhead to parse solrconfig.xml and load dependencies to open > > each core > > - lotsOfCores doesn’t work with SolrCloud, then we store indexes > > location outside of Solr. We have Solr proxies to route requests to the > > right instance. > > > > Not in production, we try the core discovery feature in Solr 4.4 with a > > lots of cores. > > When you start, it spend a lot of times to discover cores due to a big > > number of cores, meanwhile all requests fail (SolrDispatchFilter.init() > > not done yet). It will be great to have for example an option for a core > > discovery in background, or just to be able to disable it, like we do in > > our use case. > > > > If someone is interested in these new options for LotsOfCores feature, > > just tell me > > > > > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > > exclusif de ses destinataires. Il peut également être protégé par le secret > > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > > être recherchée quant au contenu de ce message. Bien que les meilleurs > > efforts soient faits pour maintenir cette transmission exempte de tout > > virus, l'expéditeur ne donne aucune garantie à cet égard et sa > > responsabilité ne saurait être recherchée pour tout dommage résultant d'un > > virus transmis. > > > > This e-mail and the documents attached are confidential and intended solely > > for the addressee; it may also be privileged. If you receive this e-mail in > > error, please notify the sender immediately and destroy it. As its > > integrity cannot be secured on the Internet, the Worldline liability cannot > > be triggered for the message content. Although the sender endeavours to > > maintain a computer virus-free network, the sender does not warrant that > > this transmission is virus-free and will not be liable for any damages > > resulting from any virus transmitted.