Gereon, I think that you must have the same schema on each shard but I am not sure if it must also have the same analyzers. These are shards of one index and not multiple indexes. There is probably a way to get each shard to contain one language but then you end up with x servers for x languages, and some will be under utilized while other will be over utilized.
Add to that fail-over and fault tolerance and you end up with a maintenance nightmare. Also, how would you scale this? Of course I am still pretty new to search and Solr/Lucene so I might be wrong :) The different fields per language or prefixing the language string to every term solutions suggested by Peter and Mike are starting to look better and better. Is it possible to write an analyzer wrapper that will also be aware of the locale field in the document and delegate processing to the appropriate analyzer? Thanks, Eli On Wed, May 7, 2008 at 3:46 PM, Gereon Steffens <[EMAIL PROTECTED]> wrote: > I have the same requirement, and from what I understand the distributed > search feature will help implementing this, by having one shard per > language. Am I right? > > Gereon > > > > > Mike Klaas wrote: > > > On 5-May-08, at 1:28 PM, Eli K wrote: > > > > > > > Wouldn't this impact both indexing and search performance and the size > > > of the index? > > > It is also probable that I will have more then one free text fields > > > later on and with at least 20 languages this approach does not seem > > > very manageable. Are there other options for making this work with > > > stemming? > > > > > > > If you want stemming, then you have to execute one query per language > anyway, since the stemming will be different in every language. > > > > This is a fundamental requirement: you somehow need to track the language > of every token if you want correct multi-language stemming. The easiest way > to do this would be to split each language into its own field. But there > are other options: you could prefix every indexed token with the language: > > > > en:The en:quick en:brown en:fox en:jumped ... > > fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ... > > > > Separate fields seems easier to me, though. > > > > -Mike > > > > > > >