Hi everyone, I thought it would be a good idea to introduce myself before I get on to bug fixing details. So, here goes.
I'm Karthik (*waves*) and you may have seen me hanging around on #kde as karthikp. I have a background in aerospace engineering and got my doctorate in 2012 studying flame structure using optical diagnostic techniques. However, I now work in another field, lithography, creating optical proximity correction recipes. As for relevant coding skills, I'm very comfortable in C++, less perhaps with Qt and cmake. Oh, and I can use git in my sleep, so that helps. :) I caught the kde bug upon discovering kubuntu sometime around 2006. In 2008, I caught the arch bug as well and the two have been with me ever since. These are bugs I _don't_ want to fix. :) The bug I _do_ want to fix is this one that affects the sonnet spell checking framework: https://bugs.kde.org/show_bug.cgi?id=337145 I reported this over the weekend, having encountered it all through my grad years when having automatic spell check in katepart would highlight "spelling errors" all over data files. So, the goal is to make sonnet smarter and avoid spell checking numbers, generally speaking. I wanted to share what I've been doing so that, a) someone can tell me if I'm on the right/wrong path, or b) if someone else is already working on it, we can pool forces and not unnecessarily duplicate work. Here's what I've done so far. The bug exists in 4.x for sure, and since the code in the kf5/sonnet repo seems to be much the same implementation as kdecore/sonnet in kdelibs, I think the fix would need to be applied there as well. My first approach was to try and extend the isValidWord() function in the Filter class to identify "words" that are actually numbers. I started with just converting QString to a double using toDouble() and using the error status to identify numbers. This actually catches most of the simple forms like, 1, 1.0, 1.0e-1, 1.0E+1, etc. but fails for numbers with field separators like 1,000. So, I added another test: if the word contains a comma, split and check if each non-empty part is a number. If so, it's not a valid word. This worked great... for a time. However, it couldn't handle this format of writing numbers: 1.23(4). This form is often found in scientific data where the number in the brackets denotes the standard error in the significant digits. So, I added another test for the presence of ( or ) and did the split dance again. That also worked great... for a time. Then came the doozy. What about 1/2? This opens the door on all kinds of expressions with operators. 1+2, 1-2, etc. Also, comparisons, 1<2 should be exempt from spell checking. Now, this approach rapidly got out of hand. My next approach (that I'm still in the middle of) is to use setBuffer() instead of isValidWord. This uses QTextBoundaryFinder to break up the text into words. I had high hopes for using the boundary type Grapheme instead of Word, but that seems to think every character is at a valid boundary. I'm now going to try and combine this with QChar::isLetterOrNumber() to identify word and number boundaries so that isValidWord() can then just drop "words" entirely composed of letters. I'd appreciate any thoughts/advice on this problem. If anyone else is working on the sonnet code, do let me know. Also, who's the current maintainer of the sonnet code base? The bug tracker CC's Zack Rusin, but the repo names Martin Sandsmark. If either are actively maintaining sonnet, I'd love to pepper you with more questions! Otherwise, I'll post an update later this week or more likely over the weekend with what I hope will be a working solution. I'll bug everyone then for help in reviewing my work and we can close this bug finally! Thanks, Karthik
signature.asc
Description: This is a digitally signed message part.
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<