Normally this is done by putting a field on each document rather than separating the documents into separate corpora. Keeping them together makes the final search faster.
At query time, you can add all of the language keys that you think are relevant based on your language id applied to the query and then group the results on those keys so that users can inspect different language results. If you require the correct language key, you should get pretty good retrieval speed. On Fri, Jan 20, 2012 at 3:35 AM, nibing <nibing_...@hotmail.com> wrote: > > Hi, Jan Høydahl You are right. I am hoping to detect the language of a > query, so that the serarching can be done according to the language > detected. Since people often type a few words, which is too few to detect, > then it is hard to do that. Let me describe a little bit about the solr > server in my design. It consists of several cores, corresponding to the > several languages, which is built during indexing. Since language detection > in indexing can be done with Tika identifier, then we are currently OK. But > the problem is about searching. I want to do language detection first > before do searching in the individual cores. In the case that detection > result is ambiguous and several languages are returned, we probably returns > a set of results, and let user to decide which language set of results they > want to look into. In general, it is just the same with the language > supported by google. Do you have some suggestions if I want to achieve > multilingual search described as above? Thank you. > Best Regards > Ni, Bing > > > Subject: Re: Tika0.10 language identifier in Solr3.5.0 > > From: jan....@cominvent.com > > Date: Thu, 19 Jan 2012 12:31:01 +0100 > > To: solr-user@lucene.apache.org > > > > Hi, > > > > You may use the string as you choose, for instance filtering > (fq=language_s:en) or for faceting (facet.field=language_s). What are you > looking to do? > > > > What would you like to detect on the query side? The language of the > search string? That is very hard since people type very few words into the > search box. > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > Solr Training - www.solrtraining.com > > > > On 19. jan. 2012, at 09:22, nibing wrote: > > > > > > > > Hi, all, > > > > > > > > > > > > I am using Solr3.5.0 which applies Tika0.10 to do language detection, > > > and I have a couple of questions about this function. > > > > > > > > > > > > 1. I can see the outcome of the language detection in a field > > > "language_s". But what action will be taken according to the different > > > language code? How to configure? > > > > > > > > > > > > 2. Currently the language detection only happens in indexing. Is it > > > possible to use the function in searching as well? How to configure? > > > > > > > > > > > > Many thanks. > > > > > > > > > Best Regards > > > > > > Ni, Bing > > > > > >