Hi, The best option would be to identify the language after parsing the PDF and then index it using an appropriate analyzer defined in schema.xml.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ________________________________ From: revathy arun <revas...@gmail.com> To: solr-user@lucene.apache.org Sent: Monday, February 16, 2009 1:42:07 PM Subject: Multilanguage Hi, I have a scenario where ,i need to convert pdf content to text and then index the same at run time .I do not know as to what language the pdf would be ,in this case which is the best soln i have with respect the content field type in the schema where the text content would be indexed to? That is can i use the default tokenizer for all languages and since i would not know the language and hence would not be able to stem the tokens,how would this impact search?Is there any other solution for the same? Rgds