Uri, Thanks a lot! I don't need to do cross language search. So Option 2 sounds better, coz my corpus is very large.
I am still looking for help on chinese language search. I tried chinesetokenizerfactory as my analyzer, but it did not help. Only word with white space, comma and etc around them can be found. Elaine On Mon, Aug 24, 2009 at 6:01 PM, Uri Boness<ubon...@gmail.com> wrote: > I can think of ways to tackle your problem: > > Option 1: each document will have a field indicating its language. Then, > when searching, you can simply filter the query on the language you're > searching on. Advantages: everything is in one index, so if in the future > you will need to do a cross language search you'll be able to do that > without changing anything. Disadvantages: Well, depending on how your data > is structured, your index can grow big - now if you always search only on > one language then you will always use only a part of the index which is to > some extent a performance penalty (depends on the size of the index). > Another disadvantage is that the schema configuration can get a bit messy - > since everything is in one index, for each field and field type you'll > probably need to define different versions for different languages (each one > with a different language specific analyzer), so for example, if you have a > "title" fields, you'll probably need to define "title_en" (for English > content) an "title_zh" (for Chinese content), then you will also need to > make sure that when you index the content, you send the right fields to Solr > (although, you can perhaps create a clever update processor that updates the > field names based on the language field). > > Option 2: have separate Solr core for each language. Advantages: Well, as > opposed to Option 1, here you have smaller indexes, where each is dedicated > to one language. If the corpus is very big you can have performance gains > here. Since we are talking about different indexes here, each core has its > own simple and clean schema (no need for multiple fields and field types). > Disadvantage: The main one is that you cannot perform cross language search. > You also need to remember to use the right Solr core when indexing & > querying. > >> 2) I posted some chinese docs to the server. The query of my chinese >> word does not return any result. This happens to my arabic docs too. >> What filter should I look at for this type of problem. Thanks a lot! >> > > Sorry, I don't have experience with Arabic or Chinese languages so I don't > know of any good analyzers for them. > > Cheers, > Uri >> >> Hi, >> >> I have two questions. >> >> 1) Can solr be configured so all my english docs will be saved in a >> group, say group-en? My chinese docs will be saved in group-cn. So my >> search will only be conducted in the intended group, instead of >> everywhere. >> >> 2) I posted some chinese docs to the server. The query of my chinese >> word does not return any result. This happens to my arabic docs too. >> What filter should I look at for this type of problem. Thanks a lot! >> >> Elaine >> >> >