Hi all!
We plan to migrate from Solr 3.5 to SolrCloud 4.0. We pass some tests and I
want to conform results with you.

So, what I have on tests:
Ubuntu 12.04 LTS, Oracle JDK 7u7, Jetty 8, SolrCloud 4.0, 4 shards (4 JVM's
on the same machine on different ports [9080, 9081, 9082, 9083]), no
replicas

My questions are:
1) Is it true, that I may send data to any of shards [9080, 9081, 9082,
9083] and don't care about how SolrCloud will distribute data between
shards? What algorithm is used: round robin?

2) For example, in ColrCloud there is a document:
<doc><field name="id">1</field><field name="name">this is Solr
3.5</field></doc>
I have no information about shard in which this doc is. I need to update
information at field "name". The new doc is:
<doc><field name="id">1</field><field name="name">this is
SolrCloud</field></doc>
Is it true, that I may send this doc to any of shards [9080, 9081, 9082,
9083] and after commit, when I run the query, I'll have "this is SolrCloud "
instead of "this is Solr 3.5" in results? As I see old data is still at
index until optimize done?

3) Is it true, that delete by query works regardless of where to send the
request?

4) My DnumShards=4. If I need to expand SolrCloud, for example, to 6 shards,
I need to remove Zookeeper data directory, set DnumShards to 6 and restart
Jetty. Can I set DnumShards=20 and only add new shards in a future with out
any removal and restart JVM?

5) Currently we have 30 shards with 50M docs. What schema you advice: shards
with ~15M docs, or more shards with less count of docs? What will be faster:
search on shards with ~15M docs or search on more shards with less count of
docs? Expected count of docs are ~1 500 000 000.

Thanks for your responses.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to