Hi,

The best option would be to identify the language after parsing the PDF and 
then index it using an appropriate analyzer defined in schema.xml.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: revathy arun <revas...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 1:42:07 PM
Subject: Multilanguage

Hi,
I have a scenario where ,i need to  convert pdf content to text  and then
index the same at run time .I do not know as to what language the pdf would
be ,in this case which is the best  soln i have with respect the content
field type in the schema where the text content would be indexed to?

That is can i use the default tokenizer for all languages and  since i would
not know the language and hence would not be able to stem the
tokens,how would  this impact search?Is there any other solution for the
same?

Rgds

Reply via email to