Thanks Grant, The requirement from the user end is to only search in that particular language and not across languages.
Also going forward we will be adding more languages. so if i have separate fields for each language ,then we need to change the schema everytime and that will not scale very well. So there are two options ,either use dynamic fields or use multi core . Please advice which is better in terms of scaling ,optimum use of existing resources (available ram which is abt 4GB for several instances of solr) . If we use multicore ,will it degrade in terms of speed etc? Any pointers will be helpful Regards Sujatha On 12/19/08, Grant Ingersoll <gsing...@apache.org> wrote: > > > On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote: > > Hi, >> I am prototyping lanuage search using solr 1.3 .I have 3 fields in the >> schema -id,content and language. >> >> I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. >> >> I use xpdf to convert the content of pdf to text and push the text to solr >> in the content field. >> >> What is the analyzer that i need to use for the above. >> >> By using the default text analyzer and posting this content to solr, i am >> not getting any results. >> >> Does solr support stemming for the above languages. >> > > I'm not familiar with Foroyo, but there should be tokenizers/analysis > available for Chines and Japanese. Are you putting all three languages into > the same field? If that is the case, you will need some type of language > detection piece that can choose the correct analyzer. > > How are your users searching? That is, do you know the language they want > to search in? If so, then you can have a field for each language. > > -Grant > >