Hi Duncan & Pedro et al thanks for the replies! Regards Dieter
On 25 April 2017 at 09:41, Duncan <1i5t5.dun...@cox.net> wrote: > Dieter Britz posted on Mon, 24 Apr 2017 12:00:15 +0200 as excerpted: > >> People talk about setting up a kill file for posters to news groups that >> annoy others, by off topic postings etc. Is it possible to do that with >> pan? > > This repeats the same idea as the replies by HH, DG and Pedro in the > other subthread, but with a bit more explanation of what pan's actually > doing and why, and why it's like binary-choice killfiling (killfiled or > not) but better. =:^) > > First, let's understand the difference between a fine-grained scoring > mechanism like pan has, where if desired the effects of many scoring > rules can be applied together to arrive at a final score for a post, > which then can be used to apply some action (like simply hiding the post, > or marking it read, or deleting it, or on the other end, hilighting it > with various colors depending on how high it scores, or automatically > downloading the post to cache, or saving its attachments), vs a hard > binary or trinary filter mechanism, which will act immediately on the > first filter that applies to either kill (generally hide and mark-read, > sometimes delete, depending on the implementation) or not, possibly (the > trinary case) with the addition of a watch flag (and perhaps auto- > download depending on implementation) if the post isn't killed. > > So in pan, a score of -9999 is defined as ignored. That's what binary > filters would filter out, also known as killing, thus the term killfile. > > And a score of +9999 is defined as watched. > > Meanwhile, FWIW, there's a number of other preset score category levels > as well. These can be seen under the view menu, header pane. Here's the > full listing, lowest to highest: > > -9999 (or lower): Ignored > > Either multiple scoring rules applied to result in the message being > ignored, *OR* a single scoring rule set ignored/-9999 and stopped further > processing of further scoring. > > By default pan doesn't display these messages, but doesn't take any other > action (marking them read, deleting them, etc). > > -9998 to -1: Low > > The result of one or more scoring rules lowered the message score into > negative territory, but not enough to make it ignored. > > 0: Default > > Of course 0 is the default score, if no scoring rules apply, or if the > scoring rules exactly balance each other out. > > 1 to 4999: Medium > > The result of one or more scoring rules was a moderate scoring boost, to > less than 5000/high, however. > > There's an option to display these in a different color, but I don't > believe it's on by default. (FWIW I've been running pan since 2002, a > decade and a half now, and long ago forgot what the defaults were for > many of the options I've customized.) > > 5000 to 9998: High > > The result of one or more scoring rules was a higher scoring boost, more > than 4999, but less than 9999. > > Again, there's an option to display these in a different color, but I > don't believe it's on by default. > > 9999 (or higher): Watched > > Either multiple scoring rules resulted in a score at or above 9999, *OR* > a single scoring rule set it to watched/9999 and stopped further scoring > rule processing. > > Pan should display these in a different color, by default I believe. > There are options (off by default) that allow auto-downloading or the > like. > > > As you should already see, scoring allows a far richer and more nuanced > setup than arbitrary binary kill/show or trinary kill/show/watch > filters. But by using the watched/ignored options only, which basically > set +9999/-9999 respectively and stop further score processing, you can > have a simpler binary or trinary setup if you wish. > > It's up to you. =:^) > > Meanwhile, as I already mentioned, there are choices under view, header > pane, to match (or not) each of these scoring categories separately. > Again under view, header pane, pan can then be set to display either > explicitly matched posts, matched posts and their subthreads, or matched > posts and their entire threads, as desired. > > It's up to you. =:^) > > And in the preferences dialog (edit menu, preferences), on the colors > tab, you can set the colors for each scoring category. > > It's up to you. =:^) > > (Tho do note that these days, pan only shows those colors in the score > column, not the entire line as it used to do. So you have to have the > score column in your listing or you won't see the colors. I preferred it > coloring the entire line, but oh, well, I'm a user, not a dev... and > unfortunately, that's NOT a user available option. As I'm writing this, > however, I'm wondering just how hard it might be to find that and patch > it to whole line, tho. I /am/ an advanced enough user that even tho I > don't claim to be a dev, I can /sometimes/ work out patches on my own, > and as I run gentoo, I normally build everything from sources and can and > often do apply my own patches or those I've picked up from others to > various packages, including pan. So I'll have to look into patching > this...) > > > OK, so you can set whether the various score categories are displayed or > not, and if displayed, you can set the color per category, but what about > more practical score-based actions? In particular, for those who track > things via marked-read, and who don't have pan's preference to > automatically mark everything in the group read when they fetch headers > or leave a group, not displaying ignored posts AND not having them > automatically marked read is frustrating, because then they hang around, > still marked unread! > > Of course if you've been paying attention, you already know the answer, > as I mentioned it above. > > It is (of course) up to you! =:^) > > (Noticing the trend yet? =:^) > > Preferences dialog, actions tab. > > One possible setup might be: > > Delete articles scoring at: -9999 or less (ignored) > > This would auto-delete ignored articles. > > Mark articles read scoring at: -9998 to -1 (low/negative) > > This would auto-mark-read negative/low-scoring articles, but wouldn't > delete them. The idea here is to let you hide them by default (by > showing only unread), but still keep them around in case you see a reply > and you want to see the message it's replying to. > > (I /believe/ it'll mark anything read UNDER the named category as well, > so it would mark ignored articles read too, if they're not deleted with > the earlier option, above. But I'm not actually sure on this bit.) > > Alternatively, if you don't delete ignored articles, you can simply mark > them read, and still show negative/low-scoring articles that aren't > entirely ignored. > > Cache articles scoring at: 1 to 4999 (medium) > > Of course you can set this to high/5000-9998 or watched/9999 instead, if > that fits your needs better. > > The idea is that if an article is sufficiently highly scored, you want it > cached for you so it's already there when you would otherwise have to > download it to cache. > > Do be aware that pan's cache size is pretty small, 10 MB by default, and > especially if you're doing binaries and using this setting, you'll > probably want a larger cache. That's set in preferences, on the behavior > tab. > > (Again, I /believe/ it'll do the same with the higher categories, high > and watched, too, but I've not actually tested it to be sure.) > > Download attachments of articles scoring at: Disabled > > If you're doing binaries, you might want to set this instead of the cache > option. > > Generally, people download binaries using one of two strategies. > > Here, I prefer to have pan's cache set way big, and download messages to > cache first, so they're local. Then when they're already cached so I > won't be waiting for the download, I can go thru and sort out what I > really want, saving it where I want it, and deleting what I don't really > want. This works best for (relatively) small binaries that you will > download many hundreds or thousands of, like still images or audio clips > mostly under 10 minutes in length, with the occasional longer audio clip > or short video. It also requires a much larger cache setting (on the > order of gigabytes, for me), or pan will start deleting previously > downloaded to cache but still unread messages, to make room for the > newest still downloading to cache messages. > > For that binaries strategy or for text messages, the auto-download-to- > cache action exists. Just be aware of the cache size requirements and > adjust it accordingly. > > The other strategy, which is obviously pan's default given the very small > 10 MB default cache size, is to have pan download and save off the > binaries immediately, without caring at all about the messages they're > attached to. Because the attachments are saved immediately and the > messages they were attached to don't matter, those messages can be > deleted from cache as soon as the attachment is saved, so this requires a > far smaller cache and pan's default 10 MB cache suffices. > > This works best for very large binaries, typically half-hour or longer > videos like TV series episodes or feature-length movies. It works best > if you don't care about the messages containing the attachments at all > (no discussion of the series, etc), since unless you increase the size of > the cache anyway, they'll be deleted effectively immediately after the > attachment processing is completed. > > It is for this binaries strategy that the auto-download-(and-save)- > attachments action exists. Obviously this isn't going to work too well > if your interest is primarily text groups (and people post binaries there > too, and the messages score high enough for the action to trigger), > because you'll end up with a bunch of random binaries that happened to be > attached to watched or whatever level scoring messages saved off to > wherever you have pan saving them. > > > OK, but what about the scoring itself? > > First of all, the watch (thread) and ignore (thread or author) entries on > the articles menu are the GUI method to create scoring rules that set the > +/-9999 score and abort further score processing. > > Next, there's the edit article's watch/ignore/score and add a scoring > rule entries, again on the articles menu. These bring up a dialog, > either directly (for add) or indirectly (for edit, using the add button > there), that lets you setup a more detailed scoring rule. This is more > flexible than the arbitrary watch/ignore options above, allowing you to > match various options and if matched either set a specific score and > abort further scoring as the above watch/ignore options do, or > alternatively, to simply add/subtract whatever score and continue > processing further scoring rules. You can also set an expiry for the > rule, if desired, or make it permanent. > > It's this last option, to add/subtract some score value and continue > processing more scoring rules, that's where the real flexibility comes > in. You can match on multiple subject keywords in multiple rules, adding > or subtracting based on the match, then add/subtract based on author, > then do some more based on references (effectively thread, only sometimes > message-ids are deleted from the header and it won't match the thread any > longer), then subtract points if it's cross-posted/spammed to too many > groups, and add or subtract more points based on size in bytes or line > count. > > As long as no match sets an arbitrary score and stops further processing, > all these matches will result in a final score that combines the effects > and the relative scoring weight of all the others, and pan uses that > final score to decide what scoring category the message belongs in, and > thus whether to show it and how, as well as what automated actions to > apply. > > See how much richer a good scoring system is, compared to arbitrary > binary/trinary-choice filtering on just ONE match-factor? > > Of course if that's too complex for you, just use the watch/ignore and be > done with it. > > It's up to you. =:^) > > > Meanwhile, as the others suggested, the real advanced stuff is reserved > for those who choose to directly edit the scorefile itself. They posted > the link to the format description. > > http://www.slrn.org/docs/score.txt > > But, keep in mind that the link above is for a different news client, > slrn, which shares a general scorefile format with pan. Unfortunately, > however, pan's score-processing code isn't quite as advanced as slrn's, > so some of the more complex stuff described there doesn't work in pan. > Pan hasn't implemented the include statement, for instance, so don't try > to use it. The {} grouping logic isn't implemented either, AFAIK. > > And, pan hasn't implemented the score keyword's single-colon AND logic, > so single or double colon doesn't matter, it's always interpreted as OR > (double-colon). This is unfortunate, but the effect can be partially > counteracted by simply creating multiple conditions, each of which gives > partial points. So instead of an AND score with five conditions to meet > and a +1000 value, you can use pan's OR scoring on each of the five > conditions, with a +200 value on each. The total if all match will still > be +1000, but of course the effect might be less anticipated if only some > conditions match and that interacts with another would-be compound with > only some conditions matching. > > Another difference is that pan's scoring matches are always case > insensitive. So don't worry about John vs. JOHN vs. john vs. JoHN, the > same regex will match them all without any fancy regex footwork. > > > Some additional scorefile format notes: > > * Unfortunately for some, understanding regular expressions is really > necessary to take full advantage of scoring, particularly when editing > the scorefile itself, but it's worth it, and pan's GUI does allow simple > scoring even if you don't know regex. > > It's up to you. =:^) > > * The note in section 1.1 recommending that one stick to the overview > headers (typically subject/from/date/message-id/references/bytes/lines > and often xref), but allowing others, most definitely applies. > Unfortunately it's a technical limitation of the protocol, not something > pan (or slrn or any other news client) can do anything about. > > The thing is that pan can score headers in the overview without > downloading the full message (or full headers). For the most part, > that's the headers needed to display the message in the headers pane, > author, subject, date, etc, plus message-id and references for threading > and tracking across multiple servers, etc. But for the more exotic > headers, pan won't get them, and thus can't score them, until the article > is downloaded to cache. > > So if you have an abuser that keeps nym-shifting and otherwise > deliberately changing everything in the headers he has access to, in > ordered to try to avoid killfiling, but who always posts thru a provider > that adds an xtrace header with a consistent value you can score on, you > *CAN* score on it, but you'll have to download the messages to cache > first. > > Take it from someone who was in the position of trying to killfile a > poster like that at one point, before pan could score such non-overview > headers, being able to ignore-score it, but only after downloading to > cache, sucks, but it definitely sucks less than having to actually show > the message in ordered to see who it is and block it! > > > * Note that while you can set an expiry on the score in the pan GUI, and > at that point pan will indeed quit applying that score, it won't actually > remove it from the scorefile. The only way to actually remove the score > from the scorefile is to manually edit it. > > Unfortunately, this does mean that if you actively add expiring scoring > rules and never manually remove them, eventually your scorefile will be > cluttered with perhaps hundreds or thousands of expired rules and they'll > begin to affect score-file loading performance as pan still has to > process them at least far enough to see they're expired, and then how far > to ignore until the beginning of the next possibly still valid rule. > > So you'll probably want to either clear out the scorefile and start new > occasionally, or manually edit it to at least clean out the expired rules > from time to time, or simply don't use expiring scores, just living with > it unless it's worth a permanent rule. > > * Yes, an initial % on a line *DOES* mean it's a comment. > > By implication, most of the lines pan adds when you add a score via the > GUI are comments and don't matter for the actual scoring at all. They're > only there to aid human readers. > > Of course that means you can edit or delete them as you wish, without > affecting actual operation. > > Here, I tend to delete pretty much all of pan's added comments, with the > exception of the date added comments for expiring scores, since that way > I can see how long I had set the expiry. > > * If you do heavy scoring with lots of rules, using pan's GUI to set them > up isn't particularly efficient for machine processing. The example in > the linked documentation is somewhat more efficient, but it's too short > to really get the point across. If you're planning to do a lot of manual > scorefile editing or simply want to make your scorefile more efficient, > either check past scoring threads for this list/group (the list is > available as a newsgroup on news.gmane.org) where I've posted a longer > example from my scorefile, or ask for such an example. > > * Similarly, if you're not good with regular expressions and need some > help designing a score that's more complex than you can easily do with > the pan GUI, or if something's just not working as you expected it to, > with scoring or something else, ask for help. We've dealt with a number > of such queries over the years. =:^) > > > OK, so hope that's of help. Some people just want an answer to plug in > without understanding it. Others want to understand what's going on, so > next time they want to do something similar but not identical, they can > figure out how to do it themselves. I'm certainly in this latter group, > and my posts tend to go to the extreme in explaining things. That > frustrates the first group, but I've stacks of thanks from people who > preferred the better understanding my explanatory if extremely verbose > style gave them, and sometimes I get new insights or ideas (like possibly > patching the score coloring to the whole line instead of just the score > column, above) as I'm writing things down, and it's the combination of > both of those that's my motivation to keep posting as I do. =:^) > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > > _______________________________________________ > Pan-users mailing list > Pan-users@nongnu.org > https://lists.nongnu.org/mailman/listinfo/pan-users -- Hilsen / Regards Dieter http://www.dieterbritz.dk _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users