Hello

I am playing with solr5 right now, to see if its cloud features can replace
what we have with solr 3.6, and I have some questions, some newbie, and
some not so newbie

Background: the documents we are putting in solr have a date field. the
majority of our searches are restricted to documents created within the
last week, but searches do go back 60 days. documents older than 60 days
are removed from the repo. we also want high availability in case a machine
becomes unavailable

our current method, using solr 3.6, is to split the data into 1 day chunks,
within each day the data is split into several shards, and each shard has 2
replicas. Our code generates the list of cores to be queried on based on
the time ranged in the query. Cores that fall off the 60 day range are
deleted through solr's RESTful API.

This all sounds a lot like what Solr Cloud provides, so I started looking
at Solr Cloud's features.

My newbie questions:

 - it looks like the way to write a document is to pick a node (possibly
using a LB), send it to that node, and let solr figure out which nodes that
document is supposed to go. is this the recommended way?
 - similarly, can I just randomly pick a core (using the demo example:
http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
it, and let it scatter out the queries to the appropriate cores, and send
me the results back? will it give me back results from all the shards?
 - is there a recommended Python library?

My hopefully less newbie questions:
 - does solr auto detect when node become unavailable, and stop sending
queries to them?
 - when the master node dies and the cluster elects a new master, what
happens to writes?
 - what happens when a node is unavailable
 - what is the procedure when a shard becomes too big for one machine, and
needs to be split?
 - what is the procedure when we lose a machine and the node needs replacing
 - how would we quickly bulk delete data within a date range?

Reply via email to