Otis Gospodnetic wrote:
Bradford,
If I may:
Have a look at http://www.sematext.com/products/language-identifier/index.html
And/or http://www.sematext.com/products/multilingual-indexer/index.html
.. and a Nutch plugin with similar functionality:
http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com