Hello I am playing with solr5 right now, to see if its cloud features can replace what we have with solr 3.6, and I have some questions, some newbie, and some not so newbie
Background: the documents we are putting in solr have a date field. the majority of our searches are restricted to documents created within the last week, but searches do go back 60 days. documents older than 60 days are removed from the repo. we also want high availability in case a machine becomes unavailable our current method, using solr 3.6, is to split the data into 1 day chunks, within each day the data is split into several shards, and each shard has 2 replicas. Our code generates the list of cores to be queried on based on the time ranged in the query. Cores that fall off the 60 day range are deleted through solr's RESTful API. This all sounds a lot like what Solr Cloud provides, so I started looking at Solr Cloud's features. My newbie questions: - it looks like the way to write a document is to pick a node (possibly using a LB), send it to that node, and let solr figure out which nodes that document is supposed to go. is this the recommended way? - similarly, can I just randomly pick a core (using the demo example: http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query it, and let it scatter out the queries to the appropriate cores, and send me the results back? will it give me back results from all the shards? - is there a recommended Python library? My hopefully less newbie questions: - does solr auto detect when node become unavailable, and stop sending queries to them? - when the master node dies and the cluster elects a new master, what happens to writes? - what happens when a node is unavailable - what is the procedure when a shard becomes too big for one machine, and needs to be split? - what is the procedure when we lose a machine and the node needs replacing - how would we quickly bulk delete data within a date range?