There are several free Language Detection libraries out there, as well
as a few commercial ones. I think Karl Wettin has even written one as
a plugin for Lucene. Nutch also has one, AIUI. I would just Google
"language detection".
Also see http://www.lucidimagination.com/search/?q=language+detection,
as this has been brought up many times before and I'm sure there are
links in the archives.
On Aug 6, 2009, at 3:46 PM, Bradford Stephens wrote:
Hey there,
We're trying to add foreign language support into our new search
engine -- languages like Arabic, Farsi, and Urdu (that don't work with
standard analyzers). But our data source doesn't tell us which
languages we're actually collecting -- we just get blocks of text. Has
anyone here worked on language detection so we can figure out what
analyzers to use? Are there commercial solutions?
Much appreciated!
--
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org