D25495: Fix Sonnet autodetect failing on Indian langs

Waqar Ahmed Wed, 01 Jan 2020 07:59:16 -0800

waqar added a comment.


  Hi,
  First of all thanks for reviewing.
  
  > I'd suggest to move your changes to GuessLanguage::identify(const QString 
&text, const QStringList &suggestionsListIn) after the call to 
d->identify(text, d->findRuns(text));
  
  Okay. I will do that, but I will have to move the `d->findRuns(text)` out of 
the function call.
  
  > but only add those languages for which there is a dictionary
  
  I think that will not be an issue because `s_scriptLanguages` only has the 
languages for which there are dictionaries. So just to make my point clear, for 
example if you don't have 'English' dictionary installed, sonnet will never be 
able to guess the language of the text.
  
  The resulting changes look like this:
  
     //get the scripts for current text
     auto scriptsList = d->findRuns(text);
    
    //try guessing from trigrams
     QStringList candidateLanguages = d->identify(text, scriptsList);
    
     if (candidateLanguages.isEmpty() && !scriptsList.isEmpty()) {
         for (const QChar::Script script : scriptsList) {
             const auto languagesList = d->s_scriptLanguages.values(script);
    
             for (const auto &lang : languagesList) {
              //if trigrams don't have this language then add it to the 
candidates
                 if (!d->s_knownModels.contains(lang))
                     candidateLanguages.append(lang);
             }
         }
     }
  
  
  
  > There is also a bug in GuessLanguagePrivate::guessFromTrigrams(const 
QString &sample, const QStringList &languages): if m_minConfidence is left to 
its default value of '0', that function will always return an empty list. I 
will propose a fix shortly.
  
  Alright, I am excited to hear.
  
  > The real issue behind Bug 176537 is a different one, however. On-the-fly 
spell checking in Kate(Part) will only check one line at a time, potentially 
not providing enough text for a meaningful language detection.
  
  To be honest, I haven't ever had an issue with that. I mostly test on 
QOwnNotes, and spellchecking works the same way there i.e., one line at a time. 
If there is a dictionary present, sonnet will guess the language correctly most 
of the times. But you are right in that,..more text would enable sonnet to be 
more accurate. However, autodetection works on a sentence basis, and sentences 
can sometimes be quite short.
  
  > I plan to perform the language detection inside KatePart, so that there is 
also feedback regading the detected language that is shown to the user, who can 
then also override the detected language, if desired.
  
  That would be really cool!
  I guess the rest of the dictionaries (of the same script) can be shown in the 
context menu to allow the user to override the detected language.

REPOSITORY
  R246 Sonnet

REVISION DETAIL
  https://phabricator.kde.org/D25495

To: waqar, mludwig, cullmann
Cc: ognarb, kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns

D25495: Fix Sonnet autodetect failing on Indian langs

Reply via email to