First you have to answer the twin questions of what you want the user
experience to be and what expectations users may have independent of your
"intentions".
Do you intend to have separate, language specific search UI? That would
match up with separate cores, but can be done with a language type field as
well.
Sometimes, users want only documents in a specific language, but sometimes
they want a "globalized" search for technical terms or names across all
languages, such as searching for "Lucene" OR "Solr" and then faceting by
language to get an idea of use by language.
From a practical perspective, maybe most docs would be English, so that
would be one big core anyway. And the main secondary languages would be
modest sized, and then you may have a large number of tiny cores. Managing a
bunch of small and tiny cores could be a pain.
Maybe three cores: English-only, all non-English, and all language - if
"globalized" search is desired. The all non-English could have a filter
query on the specific language desired, or using different field sets for
query and returned fields in a edismax query request. This is just one
technical approach, but it still all depends on intended user experience and
user expectations.
-- Jack Krupansky
-----Original Message-----
From: Ivan Hrytsyuk
Sent: Wednesday, May 16, 2012 6:31 AM
To: solr-user@lucene.apache.org
Subject: Solr Single Core vs Multiple Cores installation for localization
Hello,
We are going to add multi-language support for our Solr-based project.
We consider next Solr installation types:
1. Single core - all fields for all languages reside in a single core.
I.e. title_en, description_en, title_de, description_de, title_fr,
description_fr
2. Multiple cores - one core for one language
Looks like Multiple cores installation is more appropriate for
multi-language, but we would like to see expert comments on this.
What we have found so far for Multiple cores are:
* Pros
o Searching is faster because there is a linear relationship between index
size and query response time as the size of index volumes increases
o More flexible. We can shut-down any core at any time
o Easier to maintain
* Cons
o Startup time is bigger in comparison with Single core
Could anyone suggest:
1. Indexing for Multiple cores will be faster in comparison to Single
core installation because size of index is smaller. Is there any
relationship between size of index and time for indexing process?
2. How bigger startup time is for Solr with 30 multiple cores in
comparison to Single core in case cache warming is disabled? This option is
really important for us.
3. What processes are executed during Solr startup?
Thank you in advance, Ivan