Re: Multilanguage

Paul Libbrecht Tue, 17 Feb 2009 01:36:22 -0800

I was looking for such a tool and haven't found it yet.

Using StandardAnalyzer one can obtain some form of token-stream which can be used for "agnostic analysis". Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to decide the analyzer.


Does such a tool exist?
It doesn't seem too hard for Lucene.

paul


Le 17-févr.-09 à 04:44, Otis Gospodnetic a écrit :

The best option would be to identify the language after parsing the PDF and then index it using an appropriate analyzer defined in schema.xml.

smime.p7s
Description: S/MIME cryptographic signature

Re: Multilanguage

Reply via email to