Dieter Britz posted on Mon, 24 Apr 2017 12:00:15 +0200 as excerpted: > People talk about setting up a kill file for posters to news groups that > annoy others, by off topic postings etc. Is it possible to do that with > pan?
This repeats the same idea as the replies by HH, DG and Pedro in the other subthread, but with a bit more explanation of what pan's actually doing and why, and why it's like binary-choice killfiling (killfiled or not) but better. =:^) First, let's understand the difference between a fine-grained scoring mechanism like pan has, where if desired the effects of many scoring rules can be applied together to arrive at a final score for a post, which then can be used to apply some action (like simply hiding the post, or marking it read, or deleting it, or on the other end, hilighting it with various colors depending on how high it scores, or automatically downloading the post to cache, or saving its attachments), vs a hard binary or trinary filter mechanism, which will act immediately on the first filter that applies to either kill (generally hide and mark-read, sometimes delete, depending on the implementation) or not, possibly (the trinary case) with the addition of a watch flag (and perhaps auto- download depending on implementation) if the post isn't killed. So in pan, a score of -9999 is defined as ignored. That's what binary filters would filter out, also known as killing, thus the term killfile. And a score of +9999 is defined as watched. Meanwhile, FWIW, there's a number of other preset score category levels as well. These can be seen under the view menu, header pane. Here's the full listing, lowest to highest: -9999 (or lower): Ignored Either multiple scoring rules applied to result in the message being ignored, *OR* a single scoring rule set ignored/-9999 and stopped further processing of further scoring. By default pan doesn't display these messages, but doesn't take any other action (marking them read, deleting them, etc). -9998 to -1: Low The result of one or more scoring rules lowered the message score into negative territory, but not enough to make it ignored. 0: Default Of course 0 is the default score, if no scoring rules apply, or if the scoring rules exactly balance each other out. 1 to 4999: Medium The result of one or more scoring rules was a moderate scoring boost, to less than 5000/high, however. There's an option to display these in a different color, but I don't believe it's on by default. (FWIW I've been running pan since 2002, a decade and a half now, and long ago forgot what the defaults were for many of the options I've customized.) 5000 to 9998: High The result of one or more scoring rules was a higher scoring boost, more than 4999, but less than 9999. Again, there's an option to display these in a different color, but I don't believe it's on by default. 9999 (or higher): Watched Either multiple scoring rules resulted in a score at or above 9999, *OR* a single scoring rule set it to watched/9999 and stopped further scoring rule processing. Pan should display these in a different color, by default I believe. There are options (off by default) that allow auto-downloading or the like. As you should already see, scoring allows a far richer and more nuanced setup than arbitrary binary kill/show or trinary kill/show/watch filters. But by using the watched/ignored options only, which basically set +9999/-9999 respectively and stop further score processing, you can have a simpler binary or trinary setup if you wish. It's up to you. =:^) Meanwhile, as I already mentioned, there are choices under view, header pane, to match (or not) each of these scoring categories separately. Again under view, header pane, pan can then be set to display either explicitly matched posts, matched posts and their subthreads, or matched posts and their entire threads, as desired. It's up to you. =:^) And in the preferences dialog (edit menu, preferences), on the colors tab, you can set the colors for each scoring category. It's up to you. =:^) (Tho do note that these days, pan only shows those colors in the score column, not the entire line as it used to do. So you have to have the score column in your listing or you won't see the colors. I preferred it coloring the entire line, but oh, well, I'm a user, not a dev... and unfortunately, that's NOT a user available option. As I'm writing this, however, I'm wondering just how hard it might be to find that and patch it to whole line, tho. I /am/ an advanced enough user that even tho I don't claim to be a dev, I can /sometimes/ work out patches on my own, and as I run gentoo, I normally build everything from sources and can and often do apply my own patches or those I've picked up from others to various packages, including pan. So I'll have to look into patching this...) OK, so you can set whether the various score categories are displayed or not, and if displayed, you can set the color per category, but what about more practical score-based actions? In particular, for those who track things via marked-read, and who don't have pan's preference to automatically mark everything in the group read when they fetch headers or leave a group, not displaying ignored posts AND not having them automatically marked read is frustrating, because then they hang around, still marked unread! Of course if you've been paying attention, you already know the answer, as I mentioned it above. It is (of course) up to you! =:^) (Noticing the trend yet? =:^) Preferences dialog, actions tab. One possible setup might be: Delete articles scoring at: -9999 or less (ignored) This would auto-delete ignored articles. Mark articles read scoring at: -9998 to -1 (low/negative) This would auto-mark-read negative/low-scoring articles, but wouldn't delete them. The idea here is to let you hide them by default (by showing only unread), but still keep them around in case you see a reply and you want to see the message it's replying to. (I /believe/ it'll mark anything read UNDER the named category as well, so it would mark ignored articles read too, if they're not deleted with the earlier option, above. But I'm not actually sure on this bit.) Alternatively, if you don't delete ignored articles, you can simply mark them read, and still show negative/low-scoring articles that aren't entirely ignored. Cache articles scoring at: 1 to 4999 (medium) Of course you can set this to high/5000-9998 or watched/9999 instead, if that fits your needs better. The idea is that if an article is sufficiently highly scored, you want it cached for you so it's already there when you would otherwise have to download it to cache. Do be aware that pan's cache size is pretty small, 10 MB by default, and especially if you're doing binaries and using this setting, you'll probably want a larger cache. That's set in preferences, on the behavior tab. (Again, I /believe/ it'll do the same with the higher categories, high and watched, too, but I've not actually tested it to be sure.) Download attachments of articles scoring at: Disabled If you're doing binaries, you might want to set this instead of the cache option. Generally, people download binaries using one of two strategies. Here, I prefer to have pan's cache set way big, and download messages to cache first, so they're local. Then when they're already cached so I won't be waiting for the download, I can go thru and sort out what I really want, saving it where I want it, and deleting what I don't really want. This works best for (relatively) small binaries that you will download many hundreds or thousands of, like still images or audio clips mostly under 10 minutes in length, with the occasional longer audio clip or short video. It also requires a much larger cache setting (on the order of gigabytes, for me), or pan will start deleting previously downloaded to cache but still unread messages, to make room for the newest still downloading to cache messages. For that binaries strategy or for text messages, the auto-download-to- cache action exists. Just be aware of the cache size requirements and adjust it accordingly. The other strategy, which is obviously pan's default given the very small 10 MB default cache size, is to have pan download and save off the binaries immediately, without caring at all about the messages they're attached to. Because the attachments are saved immediately and the messages they were attached to don't matter, those messages can be deleted from cache as soon as the attachment is saved, so this requires a far smaller cache and pan's default 10 MB cache suffices. This works best for very large binaries, typically half-hour or longer videos like TV series episodes or feature-length movies. It works best if you don't care about the messages containing the attachments at all (no discussion of the series, etc), since unless you increase the size of the cache anyway, they'll be deleted effectively immediately after the attachment processing is completed. It is for this binaries strategy that the auto-download-(and-save)- attachments action exists. Obviously this isn't going to work too well if your interest is primarily text groups (and people post binaries there too, and the messages score high enough for the action to trigger), because you'll end up with a bunch of random binaries that happened to be attached to watched or whatever level scoring messages saved off to wherever you have pan saving them. OK, but what about the scoring itself? First of all, the watch (thread) and ignore (thread or author) entries on the articles menu are the GUI method to create scoring rules that set the +/-9999 score and abort further score processing. Next, there's the edit article's watch/ignore/score and add a scoring rule entries, again on the articles menu. These bring up a dialog, either directly (for add) or indirectly (for edit, using the add button there), that lets you setup a more detailed scoring rule. This is more flexible than the arbitrary watch/ignore options above, allowing you to match various options and if matched either set a specific score and abort further scoring as the above watch/ignore options do, or alternatively, to simply add/subtract whatever score and continue processing further scoring rules. You can also set an expiry for the rule, if desired, or make it permanent. It's this last option, to add/subtract some score value and continue processing more scoring rules, that's where the real flexibility comes in. You can match on multiple subject keywords in multiple rules, adding or subtracting based on the match, then add/subtract based on author, then do some more based on references (effectively thread, only sometimes message-ids are deleted from the header and it won't match the thread any longer), then subtract points if it's cross-posted/spammed to too many groups, and add or subtract more points based on size in bytes or line count. As long as no match sets an arbitrary score and stops further processing, all these matches will result in a final score that combines the effects and the relative scoring weight of all the others, and pan uses that final score to decide what scoring category the message belongs in, and thus whether to show it and how, as well as what automated actions to apply. See how much richer a good scoring system is, compared to arbitrary binary/trinary-choice filtering on just ONE match-factor? Of course if that's too complex for you, just use the watch/ignore and be done with it. It's up to you. =:^) Meanwhile, as the others suggested, the real advanced stuff is reserved for those who choose to directly edit the scorefile itself. They posted the link to the format description. http://www.slrn.org/docs/score.txt But, keep in mind that the link above is for a different news client, slrn, which shares a general scorefile format with pan. Unfortunately, however, pan's score-processing code isn't quite as advanced as slrn's, so some of the more complex stuff described there doesn't work in pan. Pan hasn't implemented the include statement, for instance, so don't try to use it. The {} grouping logic isn't implemented either, AFAIK. And, pan hasn't implemented the score keyword's single-colon AND logic, so single or double colon doesn't matter, it's always interpreted as OR (double-colon). This is unfortunate, but the effect can be partially counteracted by simply creating multiple conditions, each of which gives partial points. So instead of an AND score with five conditions to meet and a +1000 value, you can use pan's OR scoring on each of the five conditions, with a +200 value on each. The total if all match will still be +1000, but of course the effect might be less anticipated if only some conditions match and that interacts with another would-be compound with only some conditions matching. Another difference is that pan's scoring matches are always case insensitive. So don't worry about John vs. JOHN vs. john vs. JoHN, the same regex will match them all without any fancy regex footwork. Some additional scorefile format notes: * Unfortunately for some, understanding regular expressions is really necessary to take full advantage of scoring, particularly when editing the scorefile itself, but it's worth it, and pan's GUI does allow simple scoring even if you don't know regex. It's up to you. =:^) * The note in section 1.1 recommending that one stick to the overview headers (typically subject/from/date/message-id/references/bytes/lines and often xref), but allowing others, most definitely applies. Unfortunately it's a technical limitation of the protocol, not something pan (or slrn or any other news client) can do anything about. The thing is that pan can score headers in the overview without downloading the full message (or full headers). For the most part, that's the headers needed to display the message in the headers pane, author, subject, date, etc, plus message-id and references for threading and tracking across multiple servers, etc. But for the more exotic headers, pan won't get them, and thus can't score them, until the article is downloaded to cache. So if you have an abuser that keeps nym-shifting and otherwise deliberately changing everything in the headers he has access to, in ordered to try to avoid killfiling, but who always posts thru a provider that adds an xtrace header with a consistent value you can score on, you *CAN* score on it, but you'll have to download the messages to cache first. Take it from someone who was in the position of trying to killfile a poster like that at one point, before pan could score such non-overview headers, being able to ignore-score it, but only after downloading to cache, sucks, but it definitely sucks less than having to actually show the message in ordered to see who it is and block it! * Note that while you can set an expiry on the score in the pan GUI, and at that point pan will indeed quit applying that score, it won't actually remove it from the scorefile. The only way to actually remove the score from the scorefile is to manually edit it. Unfortunately, this does mean that if you actively add expiring scoring rules and never manually remove them, eventually your scorefile will be cluttered with perhaps hundreds or thousands of expired rules and they'll begin to affect score-file loading performance as pan still has to process them at least far enough to see they're expired, and then how far to ignore until the beginning of the next possibly still valid rule. So you'll probably want to either clear out the scorefile and start new occasionally, or manually edit it to at least clean out the expired rules from time to time, or simply don't use expiring scores, just living with it unless it's worth a permanent rule. * Yes, an initial % on a line *DOES* mean it's a comment. By implication, most of the lines pan adds when you add a score via the GUI are comments and don't matter for the actual scoring at all. They're only there to aid human readers. Of course that means you can edit or delete them as you wish, without affecting actual operation. Here, I tend to delete pretty much all of pan's added comments, with the exception of the date added comments for expiring scores, since that way I can see how long I had set the expiry. * If you do heavy scoring with lots of rules, using pan's GUI to set them up isn't particularly efficient for machine processing. The example in the linked documentation is somewhat more efficient, but it's too short to really get the point across. If you're planning to do a lot of manual scorefile editing or simply want to make your scorefile more efficient, either check past scoring threads for this list/group (the list is available as a newsgroup on news.gmane.org) where I've posted a longer example from my scorefile, or ask for such an example. * Similarly, if you're not good with regular expressions and need some help designing a score that's more complex than you can easily do with the pan GUI, or if something's just not working as you expected it to, with scoring or something else, ask for help. We've dealt with a number of such queries over the years. =:^) OK, so hope that's of help. Some people just want an answer to plug in without understanding it. Others want to understand what's going on, so next time they want to do something similar but not identical, they can figure out how to do it themselves. I'm certainly in this latter group, and my posts tend to go to the extreme in explaining things. That frustrates the first group, but I've stacks of thanks from people who preferred the better understanding my explanatory if extremely verbose style gave them, and sometimes I get new insights or ideas (like possibly patching the score coloring to the whole line instead of just the score column, above) as I'm writing things down, and it's the combination of both of those that's my motivation to keep posting as I do. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users