waqar added a comment.
Hi, First of all thanks for reviewing. > I'd suggest to move your changes to GuessLanguage::identify(const QString &text, const QStringList &suggestionsListIn) after the call to d->identify(text, d->findRuns(text)); Okay. I will do that, but I will have to move the `d->findRuns(text)` out of the function call. > but only add those languages for which there is a dictionary I think that will not be an issue because `s_scriptLanguages` only has the languages for which there are dictionaries. So just to make my point clear, for example if you don't have 'English' dictionary installed, sonnet will never be able to guess the language of the text. The resulting changes look like this: //get the scripts for current text auto scriptsList = d->findRuns(text); //try guessing from trigrams QStringList candidateLanguages = d->identify(text, scriptsList); if (candidateLanguages.isEmpty() && !scriptsList.isEmpty()) { for (const QChar::Script script : scriptsList) { const auto languagesList = d->s_scriptLanguages.values(script); for (const auto &lang : languagesList) { //if trigrams don't have this language then add it to the candidates if (!d->s_knownModels.contains(lang)) candidateLanguages.append(lang); } } } > There is also a bug in GuessLanguagePrivate::guessFromTrigrams(const QString &sample, const QStringList &languages): if m_minConfidence is left to its default value of '0', that function will always return an empty list. I will propose a fix shortly. Alright, I am excited to hear. > The real issue behind Bug 176537 is a different one, however. On-the-fly spell checking in Kate(Part) will only check one line at a time, potentially not providing enough text for a meaningful language detection. To be honest, I haven't ever had an issue with that. I mostly test on QOwnNotes, and spellchecking works the same way there i.e., one line at a time. If there is a dictionary present, sonnet will guess the language correctly most of the times. But you are right in that,..more text would enable sonnet to be more accurate. However, autodetection works on a sentence basis, and sentences can sometimes be quite short. > I plan to perform the language detection inside KatePart, so that there is also feedback regading the detected language that is shown to the user, who can then also override the detected language, if desired. That would be really cool! I guess the rest of the dictionaries (of the same script) can be shown in the context menu to allow the user to override the detected language. REPOSITORY R246 Sonnet REVISION DETAIL https://phabricator.kde.org/D25495 To: waqar, mludwig, cullmann Cc: ognarb, kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns