[issue21344] save scores or ratios in difflib get_close_matches
New submission from Russell Ballestrini: The current implementation of difflib's get_close_matches() function computes computationally complex scores (ratios) but then tosses them out without giving the end-user the chance to have at them. This patch adds an optional "scores" boolean argument that may be passed to alter the return output from a list of words, to a list of (score, word) tuples. -- components: Library (Lib) files: difflib.py messages: 217123 nosy: russellballestrini priority: normal severity: normal status: open title: save scores or ratios in difflib get_close_matches type: enhancement versions: Python 2.7, Python 3.5 Added file: http://bugs.python.org/file35022/difflib.py ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Changes by Russell Ballestrini : Removed file: http://bugs.python.org/file35022/difflib.py ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Changes by Russell Ballestrini : -- keywords: +patch Added file: http://bugs.python.org/file35023/difflib-patch-to-save-scores-in-get-close-matches.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: Claudiu.Popa, Yes, that was my first idea on how to tackle this issue. I will create another proper patch that prepares two separate functions: * get_close_matches * get_scored_close_matches Where each are basically wrapper / API functions around a private function that holds the algorithm: * _get_scored_close_matches -- ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: New function in difflib: get_scored_matches() This function acts just like the existing get_close_matches() function however instead of returning a list of words, it returns a list of tuples (score, word) pairs. This gives the end-user the ability to access the computationally expensive scores/ratios produced as a by-product. The new usage does _not_ impact backward compatibility:: >>> import difflib >>> import keyword as _keyword >>> difflib.get_scored_matches("wheel", _keyword.kwlist) [(0.6, 'while')] >>> difflib.get_close_matches("wheel", _keyword.kwlist) ['while'] HG: Enter commit message. Lines beginning with 'HG:' are removed. HG: Leave message empty to abort commit. HG: -- HG: user: RussellBallestrini HG: branch 'default' changed Lib/difflib.py -- Added file: http://bugs.python.org/file35024/difflib-patch-to-save-scores-2.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: get_close_matches() doesn't seem to have any tests... I suppose I should write them considering I'm changing the functionality a bit. TODO: write tests for * difflib.get_close_matches() * difflib.get_scored_matches() Determine if docstrings are enough to document the new function. (I thought it would be) -- ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Changes by Russell Ballestrini : Removed file: http://bugs.python.org/file35023/difflib-patch-to-save-scores-in-get-close-matches.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Changes by Russell Ballestrini : Removed file: http://bugs.python.org/file35024/difflib-patch-to-save-scores.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: Ok, this patch is ready for review. -- Added file: http://bugs.python.org/file35033/diff-lib-get-scored-matches-tests-and-docs.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: At some point I plan to write a web API that accepts a word, 'doge' and returns a list of possible suggestions and scores. Later a "did you mean dog" style suggestion could be implemented on top. We compute the scores, and it is computationally taxing, we shouldn't always throw this data away. Most users will continue to use get_close_matches, some users might want to build indexes on the scores. Other users may want to cache (memonize) common queries for super fast look ups. Additionally the new function will give end-users the opportunity to inspect the scoring algos output. I prefer to use the same arg spec because it is already widely understood and documented. -- ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: Adding patch to update tests to use Tim Peters suggestion of assertListEqual over assertEqual for list compares. -- Added file: http://bugs.python.org/file35040/diff-lib-tim-peters-assert-list-equals.patch ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21344] save scores or ratios in difflib get_close_matches
Russell Ballestrini added the comment: Tim, You bring up some great points and insight I was missing. "To me the scores just aren't interesting beyond which words' scores exceed a cutoff, and the ordering of words based on their similarity scores - but `get_close_matches()` already captures those uses." For a *word*, and a corpus of *possibilities*, how does one choose a satisfactory *cutoff* without inspecting the output of the scoring algorithm? Personally, I don't want to inpect scores for inspection sake, I want to inspect scores so I can make an informed decision for the *n* and *cutoff* input arguments. Its true that after reading and digesting the source code for `get_close_matches()` I could (and did) implement a version that returns scores. My goal was to share this code and what better way then to "fix" the problem upstream. I understand the desire to keep the standard library lean and useful to reduce the amount of burden the code is to maintain. I will understand if we decide not to include these patches, I can always maintain a fork and share on pypi. -- ___ Python tracker <http://bugs.python.org/issue21344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com