Sharding in SolrCloud
Hello, we tested SolrCloud in a setup with one collection, two shards and one replica per shard and it works quite fine with some example data. Now, we plan to set up our own collection and determine in how many shards we should devide it. We can estimate quite exactly the size of the collection, but we don't know, what the best approach for sharding is, even if we know the size and the amount of queries and updates. Is there any documentation or a kind of design guidelines for sharding a collection in SolrCloud? Thanks & regards, Norman Lenzner
Re: Sharding in SolrCloud
Mark Miller schrieb am 12.06.2012 19:19:01: > > > On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: > > > Hello, > > > > we tested SolrCloud in a setup with one collection, two shards and one > > replica per shard and it works quite fine with some example data. > > Now, we plan to set up our own collection and determine in how many shards > > we should devide it. > > We can estimate quite exactly the size of the collection, but we don't > > know, what the best approach for sharding is, > > even if we know the size and the amount of queries and updates. > > Is there any documentation or a kind of design guidelines for sharding a > > collection in SolrCloud? > > > > > > Thanks & regards, > > Norman Lenzner > > > It's hard to tell - I think you want to start with an idea of how > many docs you can fit on a single node. This can vary wildly > depending on many factors. Generally you have to do some testing > with your particular config and data. You can search the mailing > lists and perhaps dig up a little info, but there is really no > replacement for running some tests with real data. > > Then you have to plan in your growth rate - resharding is naturally > a relatively expensive operation. Once you have an idea of how many > docs per machine you think seems comfortable, figure out how > machines you need given your estimated doc growth rate and perhaps > some padding. You might not get it right, but if you expect the > possibility of a lot of growth, erring on the more shards side is > obviously better. > > - Mark Miller > lucidimagination.com > Hello and thanks for your reply, We will run some tests to determine the size of our collection, but I think, there won't be the need of a second shard at all. The problem is not the size or the growth of the docs, but there will be a quite high update frequency. So, if we have many bulk updates, is it reasonable to distribute the update load on multiple shards? Thanks & regards, Norman Lenzner
Solr4 BETA "group.ngroups" count
Hello, I have a problem using grouped queries and the 'group.ngroups' parameter. When I run the following request /select?q=&group=true&group.field=personId&group.ngroups=true&wt=xml the response looks like this: 11 6 106.12345 ... 106.12312 ... 101.12313 ... 101.12312 ... I expected, that the ngroups results in 4, because it is the total count of all groups, that match my query. The result of 'matches' is right, and the 11 docs are distributed on the 4 groups of my response, but I have no idea, what ngroups is counting in this case. Can anybody explain to me, what's the meaning of ngroups is? regards Norman Lenzner
Update 4.0 to 4.1 (4.2.1): No slice servicing hash code
Hello, I tried updating our solrcloud from 4.0.0 to 4.1.0. So I set up a cloud on my local machine with a standalone zookeeper (3.4.5), 3 collections and 6 Solr servers (4.0.0). I added some documents via SolrJ, and stopped the servers. After that I restarted the nodes with the newer version (4.1.0). After restarting the nodes everything looks fine - all nodes are active, but when I started to add documents via SolrJ, the following exception occured: org.apache.solr.common.SolrException: No slice servicing hash code 8330c664 in DocCollection(anschriften)={"shards":{"shard1":{ "replicas":{ ":8001_solr_anschriften":{ "shard":"shard1", "state":"active", "core":"anschriften", "collection":"anschriften", "node_name":":8001_solr", "base_url":"http://:8001/solr", "leader":"true"}, ":8002_solr_anschriften":{ "shard":"shard1", "state":"active", "core":"anschriften", "collection":"anschriften", "node_name":":8002_solr", "base_url":"http://:8002/solr"}}, "state":"active"}}} at org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:52) at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:34) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:200) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) There is only one shard, and it should contain all documents. Do you have any idea what's going wrong? Thanks, Norman Lenzner