On Fri, Sep 4, 2009 at 4:35 AM, Jonathan Ariel <ionat...@gmail.com> wrote:
> It seems like it is really hard to decide when the Multiple Core solution > is > more appropriate.As I could understand from this list and wiki the Multiple > Core feature was designed to address the need of handling different sets of > data within the same solr instance, where the sets of data don't need to be > joined. > Correct. It is also useful when you don't want to setup multiple boxes or tomcats for each Solr. > In my case the documents are of a specific site and country. So document A > can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / > Country > 2, and so on. > For the use cases of my application I will never query across countries or > sites. I will always have to provide to the query the country id and the > site id. > Would you suggest to split my data into cores? I have few sites (around 20) > and more countries (around 90). > Should I split my data into sites (around 20 cores) and within a core > filter > by site? Should I split by Site and Country (around 1800 cores)? > What should I consider when splitting my data into multiple cores? > > The first question is why do you want to split at all? Is the schema or solrconfig different? Are the different sites or countries updated at different times? Is the combined index very big that the response times jump wildly when all the caches are thrown out if documents related to one site or country are updated? Does warmup or optimize or replication take too much time with one big index? Each core will have its own configuration files (maintenance) and you need to setup replication separately for each core (which is a pain with the script based replication). Also note that by keeping all cores in one tomcat (one JVM), a stop-the-world GC will stop all cores which is not the case when using separate JVMs for each index/core. -- Regards, Shalin Shekhar Mangar.