There are several free Language Detection libraries out there, as well as a few commercial ones. I think Karl Wettin has even written one as a plugin for Lucene. Nutch also has one, AIUI. I would just Google "language detection".

Also see http://www.lucidimagination.com/search/?q=language+detection, as this has been brought up many times before and I'm sure there are links in the archives.

On Aug 6, 2009, at 3:46 PM, Bradford Stephens wrote:

Hey there,

We're trying to add foreign language support into our new search
engine -- languages like Arabic, Farsi, and Urdu (that don't work with
standard analyzers). But our data source doesn't tell us which
languages we're actually collecting -- we just get blocks of text. Has
anyone here worked on language detection so we can figure out what
analyzers to use? Are there commercial solutions?

Much appreciated!

--
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Reply via email to