I can think of ways to tackle your problem:

Option 1: each document will have a field indicating its language. Then, when searching, you can simply filter the query on the language you're searching on. Advantages: everything is in one index, so if in the future you will need to do a cross language search you'll be able to do that without changing anything. Disadvantages: Well, depending on how your data is structured, your index can grow big - now if you always search only on one language then you will always use only a part of the index which is to some extent a performance penalty (depends on the size of the index). Another disadvantage is that the schema configuration can get a bit messy - since everything is in one index, for each field and field type you'll probably need to define different versions for different languages (each one with a different language specific analyzer), so for example, if you have a "title" fields, you'll probably need to define "title_en" (for English content) an "title_zh" (for Chinese content), then you will also need to make sure that when you index the content, you send the right fields to Solr (although, you can perhaps create a clever update processor that updates the field names based on the language field).

Option 2: have separate Solr core for each language. Advantages: Well, as opposed to Option 1, here you have smaller indexes, where each is dedicated to one language. If the corpus is very big you can have performance gains here. Since we are talking about different indexes here, each core has its own simple and clean schema (no need for multiple fields and field types). Disadvantage: The main one is that you cannot perform cross language search. You also need to remember to use the right Solr core when indexing & querying.

2) I posted some chinese docs to the server. The query of my chinese
word does not return any result. This happens to my arabic docs too.
What filter should I look at for this type of problem. Thanks a lot!
Sorry, I don't have experience with Arabic or Chinese languages so I don't know of any good analyzers for them.

Cheers,
Uri
Hi,

I have two questions.

1) Can solr be configured so all my english docs will be saved in a
group, say group-en? My chinese docs will be saved in group-cn. So my
search will only be conducted in the intended group, instead of
everywhere.

2) I posted some chinese docs to the server. The query of my chinese
word does not return any result. This happens to my arabic docs too.
What filter should I look at for this type of problem. Thanks a lot!

Elaine

Reply via email to