Hi, I am prototyping lanuage search using solr 1.3 .I have 3 fields in the schema -id,content and language.
I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. I use xpdf to convert the content of pdf to text and push the text to solr in the content field. What is the analyzer that i need to use for the above. By using the default text analyzer and posting this content to solr, i am not getting any results. Does solr support stemmin for the above languages. Regards Sujatha On 12/18/08, Feak, Todd <todd.f...@smss.sony.com> wrote: > > Don't forget to consider scaling concerns (if there are any). There are > strong differences in the number of searches we receive for each > language. We chose to create separate schema and config per language so > that we can throw servers at a particular language (or set of languages) > if we needed to. We see 2 orders of magnitude difference between our > most popular language and our least popular. > > -Todd Feak > > -----Original Message----- > From: Julian Davchev [mailto:j...@drun.net] > Sent: Wednesday, December 17, 2008 11:31 AM > To: solr-user@lucene.apache.org > Subject: looking for multilanguage indexing best practice/hint > > Hi, > From my study on solr and lucene so far it seems that I will use single > scheme.....at least don't see scenario where I'd need more than that. > So question is how do I approach multilanguage indexing and multilang > searching. Will it really make sense for just searching word..or rather > I should supply lang param to search as well. > > I see there are those filters and already advised on them but I guess > question is more of a best practice. > solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory > > So solution I see is using copyField I have same field in different > langs or something using distinct filter. > Cheers > > > >