On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:
>
> Awesome! I'm seeing some inconsistency though. Does anyone know why a
> Bayesian classifier would produce such different results? Could it be
> because of the short input text?
>
> (lang/detect "My name is joe")
> ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}]
>
> (lang/detect "My name is joe")
> ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}]
>
> (lang/detect "My name is joe")
> ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt"
> "0.14285645678764042"}]
>
Yes, language-detect does a fuzzy matching-letter-frequency-count (the
non-scientific name for it) sort of algorithm in an attempt to quickly
determine the language, so for shorter text, it has a higher chance of
being incorrect (because there is less letter frequency to analyze). Give
it a try with a longer input string.
Additionally, you could adjust the :smoothing option for the string, or
pass in a map of probabilities in as a :prior-map to coerce it one way or
the other manually.
- Lee
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en