Hi ,
Currently we are using Solr 1.3 and we have the following requirement.
As we need to process very high volumes of documents (of the order of 400 GB
per day), we are planning to separate indexer(s) and searcher(s), so that there
won't be performance hit.
Our idea is to have have a set of servers which is used only for indexers for
index creation and then every 5 mins or so, the index will be copied to the
searchers(set of solr servers only for querying). For this we tried to use the
snapshooter,rsysnc etc.
But the problem with this approach is, the same index is present on both the
indexer and searcher, and hence occupying large FS.
What we need is a mechanism, where in the indexer contains only the index for
the past 5 mins(last indexing cycle before the snap shooter is run) and the
searcher should have the accumulated(total) index i.e every 5 mins, we should
be able to move the entire index from indexer to searcher and so on.
The above scenario is slightly different from master/slave implementation, as
on master we want only the latest(WIP) index and the slave should contain the
entire index.
Appreciate if anyone can throw some light on how to achieve this.
Thanks,
sS