Hi all! We plan to migrate from Solr 3.5 to SolrCloud 4.0. We pass some tests and I want to conform results with you.
So, what I have on tests: Ubuntu 12.04 LTS, Oracle JDK 7u7, Jetty 8, SolrCloud 4.0, 4 shards (4 JVM's on the same machine on different ports [9080, 9081, 9082, 9083]), no replicas My questions are: 1) Is it true, that I may send data to any of shards [9080, 9081, 9082, 9083] and don't care about how SolrCloud will distribute data between shards? What algorithm is used: round robin? 2) For example, in ColrCloud there is a document: <doc><field name="id">1</field><field name="name">this is Solr 3.5</field></doc> I have no information about shard in which this doc is. I need to update information at field "name". The new doc is: <doc><field name="id">1</field><field name="name">this is SolrCloud</field></doc> Is it true, that I may send this doc to any of shards [9080, 9081, 9082, 9083] and after commit, when I run the query, I'll have "this is SolrCloud " instead of "this is Solr 3.5" in results? As I see old data is still at index until optimize done? 3) Is it true, that delete by query works regardless of where to send the request? 4) My DnumShards=4. If I need to expand SolrCloud, for example, to 6 shards, I need to remove Zookeeper data directory, set DnumShards to 6 and restart Jetty. Can I set DnumShards=20 and only add new shards in a future with out any removal and restart JVM? 5) Currently we have 30 shards with 50M docs. What schema you advice: shards with ~15M docs, or more shards with less count of docs? What will be faster: search on shards with ~15M docs or search on more shards with less count of docs? Expected count of docs are ~1 500 000 000. Thanks for your responses. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769.html Sent from the Solr - User mailing list archive at Nabble.com.