CLD2 probabilistically detects over 80 languages in Unicode UTF-8 text,
either plain text or HTML/XML. For mixed-language input, CLD2 returns
the top three languages found and their approximate percentages of the
total text bytes.  Optionally, it also returns a vector of text spans
with the language of each identified. CLD2 is not designed to do well on
very short text, lists of proper names, part numbers, etc.
