[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-12 Thread Nabeel Alzahrani


New submission from Nabeel Alzahrani :

The difflib.SequenceMatcher.ratio() gives 0.3 instead of 1.0 or at least 0.9 
for the following two strings a and b: 
a="""
#include 
#include 
using namespace std;
int main() {
   string userWord;
   unsigned int i;
  cin >> userWord;

  
  for(i = 0; i < userWord.size(); i++) {
 if(userWord.at(i) == 'i') {
userWord.at(i) = '1';
 }
 if(userWord.at(i) == 'a') {
userWord.at(i) = '@'; 
 }
 if(userWord.at(i) == 'm') {
userWord.at(i) = 'M';
 }
 if(userWord.at(i) == 'B') {
userWord.at(i) = '8';
 }
 if(userWord.at(i) == 's') {
userWord.at(i) = '$';
 }
 userWord.push_back('!');
  }
  cout << userWord << endl;
   return 0;
}
"""

b="""
#include 
#include 
using namespace std;
int main() {
   string userWord;
   unsigned int i;
  cin >> userWord;
  userWord.push_back('!');
  
  for(i = 0; i < userWord.size(); i++) {
 if(userWord.at(i) == 'i') {
userWord.at(i) = '1';
 }
 if(userWord.at(i) == 'a') {
userWord.at(i) = '@'; 
 }
 if(userWord.at(i) == 'm') {
userWord.at(i) = 'M';
 }
 if(userWord.at(i) == 'B') {
userWord.at(i) = '8';
 }
 if(userWord.at(i) == 's') {
userWord.at(i) = '$';
 }
   
  }
  cout << userWord << endl;
   return 0;
}
"""

--
components: Library (Lib)
messages: 401683
nosy: nalza001
priority: normal
severity: normal
status: open
title: possible wrong result for difflib.SequenceMatcher.ratio()
type: behavior
versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-15 Thread Nabeel Alzahrani


Nabeel Alzahrani  added the comment:

But when I turn off the "autojunk" feature for the following example, I get the 
wrong ratio of 0.5 instead of the correct ratio of 0.2 with autojunk enabled.

a="""
#include 
#include 
using namespace std;
int main() {
   
   string userPass;
   int sMaxIndex;
   char indivChar;
   int i;
   
   cin >> userPass;
   
   sMaxIndex = userPass.size() - 1;
   
   
   for (i = 0; i <= sMaxIndex; ++i) {
  
  indivChar = userPass.at(i);
  
  if (indivChar == 'i') {
 
 indivChar = '1';
 cout << indivChar;
 
  }
  else if (indivChar == 'a') {
 
 indivChar = '@';
 cout << indivChar;
 
  }
  else if (indivChar == 'm') {
 
 indivChar = 'M';
 cout << indivChar;
 
  }
  else if (indivChar == 'B') {
 
 indivChar = '8';
 cout << indivChar;
 
  }
  else if (indivChar == 's') {
 
 indivChar = '$';
 cout << indivChar;
 
  }
  else {
 
 cout << indivChar;
 
  }
  
   }
   
   cout << "!" << endl;
   
   return 0;
}
"""

b="""
#include 
#include 
using namespace std;
int main() {
   string ori;
   cin >> ori;
   for (int i = 0; i < ori.size(); i++){
  if (ori.at(i) == 'i')
 ori.at(i) = '1';
  if (ori.at(i) == 'a')
 ori.at(i) = '@';
  if (ori.at(i) == 'm')
 ori.at(i) = 'M';
  if (ori.at(i) == 'B')
 ori.at(i) = '8';
  if (ori.at(i) == 's')
 ori.at(i) = '$';
  }
   cout << ori << endl;

   return 0;
}
"""

--

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-15 Thread Nabeel Alzahrani


Change by Nabeel Alzahrani :


--
resolution: not a bug -> 
status: closed -> open

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-15 Thread Nabeel Alzahrani


Nabeel Alzahrani  added the comment:

Here are the steps that I used to calculate 0.2 for the last example:

I used class difflib.HtmlDiff to find the number of changed chars (addedChars, 
deletedChars, and changedChars) which is 1172 (let us call it delta)

The size of both strings a and b in this example is 1470

I calculated the similality ratio using 1-(delta/totalSize) = 1-(1172/1470)=0.2

I am assuming both classes difflib.SequenceMatcher and difflib.HtmlDiff are 
both using the same algorithms and arguments and if so they should produce the 
same ratio. Is that right?

--
status: closed -> open

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com