To answer my own post about the subtle difference between the shard and replicate examples, it looks like the difference is in the numShards parameter.
If you define numShards to be = 2, and then creating more shards than 2 will give you replicates. Is that correct? If that is the case, I think that my settings are correct. I still do not explain why I have such growth on all the shards at the same time. One thing I noticed is that three of them are leaders in the SolrCloud admin UI graph. Is that normal? Thierry On Mon, Aug 12, 2013 at 5:39 PM, Thierry Thelliez < thierry.thelliez.t...@gmail.com> wrote: > > Thanks Shawn for the detailed instructions. > > About the router: it is implicit. > > About the replicas: I followed the example at > http://wiki.apache.org/solr/SolrCloud > > I start the shards with the following (paths and ports simplified): > > cd /.../solr/shard1/ > /usr/bin/java -Djetty.port=1 -Dbootstrap_confdir=./solr/collection1/conf > -Dcollection.configName=myconf -DzkRun=localhost:0 -DnumShards=4 -jar > start.jar > /.../log/shard_1.log > > cd /.../solr/shard2/ > /usr/bin/java -Djetty.port=2 -DzkHost=localhost:0 -jar start.jar > > /.../log/shard_2.log > > and same thing for the two other shards on their own ports. > > > To post a document (CSV file), I use: > > curl http://localhost:shardport/solr/update --data-binary file.csv > -H 'Content-type:text/csv; charset=ISO-8859-1' > > > I just re-read the example page at http://wiki.apache.org/solr/SolrCloud > and I see that there is no difference between starting a shard or a > replicate. I must be missing something: > > From exampleA (two shards): > > cd example2 > > java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar > > Fomr exampleB (two shards with replicates): > > cd exampleB > > java -Djetty.port=8900 -DzkHost=localhost:9983 -jar start.jar > > Thanks. > Thierry > > > > > > > > > > > On Mon, Aug 12, 2013 at 5:04 PM, Shawn Heisey <s...@elyograg.org> wrote: > >> On 8/12/2013 4:50 PM, Thierry Thelliez wrote: >> >>> Hello, I am trying to set a four shard system for the first time. I do >>> not understand why all the shards data are growing at about the same rate >>> when I push the documents to only one shard. >>> >>> The four shards represent four calendar years. And for now, on a >>> development machine, these four shards run on four different ports. >>> >>> The first shard is started with Zookeeper. >>> >>> The log of the other shards is filed with something like: >>> >>> 7882051 [qtp1154079020-1245] INFO >>> org.apache.solr.update.**processor.LogUpdateProcessor – [collection1] >>> webapp=/solr path=/update params={distrib.from= >>> http://x.y.z.4:50121/solr/**collection1/&update.distrib=** >>> TOLEADER&wt=javabin&version=2<http://x.y.z.4:50121/solr/collection1/&update.distrib=TOLEADER&wt=javabin&version=2> >>> } >>> {add=[14939-96467-304 (1443204912169091072), 14939-96467-308 >>> (1443204912179576832), 14939-96467-310 (1443204912185868288), >>> 14939-96467-311 (1443204912192159744), 14939-96467-313 >>> (1443204912204742656), 14939-96467-314 (1443204912220471296), >>> 14939-96467-318 (1443204912239345664), 14939-96467-319 >>> (1443204912250880000), 14939-96467-322 (1443204912257171456), >>> 14939-96467-324 (1443204912263462912)]} 0 282 >>> >>> What is getting written to the other shards? Is a separate index computed >>> on all four shards? I thought that when pushing a document to one shard, >>> only that shard would update its index. >>> >> >> There are two possibilities. >> >> 1) You don't have four shards, you have four replicas of one shard. If >> this is happening, then they all will receive all documents. >> >> 2) You are using a router like compositeId instead of implicit. This >> will calculate the hash of the id field and evenly divide the documents >> among all the shards in the collection according to the hash value. If you >> create the collection with the implicit router, then documents should be >> indexed by the shard that received them. >> >> To see what router you have, click on Cloud in the admin UI, then click >> on Tree. Click the arrow to the left of '/collections' to open it. Click >> on collection1 (or whichever you are actually using) -- the actual name, >> not the arrow. Underneath the table that appears to the right will be >> "router" and its value. >> >> Thanks, >> Shawn >> >> >