First you have to answer the twin questions of what you want the user experience to be and what expectations users may have independent of your "intentions".

Do you intend to have separate, language specific search UI? That would match up with separate cores, but can be done with a language type field as well.

Sometimes, users want only documents in a specific language, but sometimes they want a "globalized" search for technical terms or names across all languages, such as searching for "Lucene" OR "Solr" and then faceting by language to get an idea of use by language.

From a practical perspective, maybe most docs would be English, so that
would be one big core anyway. And the main secondary languages would be modest sized, and then you may have a large number of tiny cores. Managing a bunch of small and tiny cores could be a pain.

Maybe three cores: English-only, all non-English, and all language - if "globalized" search is desired. The all non-English could have a filter query on the specific language desired, or using different field sets for query and returned fields in a edismax query request. This is just one technical approach, but it still all depends on intended user experience and user expectations.

-- Jack Krupansky

-----Original Message----- From: Ivan Hrytsyuk
Sent: Wednesday, May 16, 2012 6:31 AM
To: solr-user@lucene.apache.org
Subject: Solr Single Core vs Multiple Cores installation for localization

Hello,

We are going to add multi-language support for our Solr-based project.

We consider next Solr installation types:

1. Single core - all fields for all languages reside in a single core. I.e. title_en, description_en, title_de, description_de, title_fr, description_fr

2.       Multiple cores - one core for one language

Looks like Multiple cores installation is more appropriate for multi-language, but we would like to see expert comments on this.
What we have found so far for Multiple cores are:

*         Pros

o Searching is faster because there is a linear relationship between index size and query response time as the size of index volumes increases

o   More flexible. We can shut-down any core at any time

o   Easier to maintain

*         Cons

o   Startup time is bigger in comparison with Single core

Could anyone suggest:

1. Indexing for Multiple cores will be faster in comparison to Single core installation because size of index is smaller. Is there any relationship between size of index and time for indexing process?

2. How bigger startup time is for Solr with 30 multiple cores in comparison to Single core in case cache warming is disabled? This option is really important for us.

3.       What processes are executed during Solr startup?

Thank you in advance, Ivan

Reply via email to