JCA posted on Mon, 15 Sep 2014 15:06:31 -0600 as excerpted: > I was wondering if Pan can do the following: > > Let's assume take a user U in a given group G. U is a crank, a > troll or something like that. I would like to tell Pan to ignore not > only all posts from U but also all threads initiated by U. Is this > possible with Pan?
Ignoring threads by a specific person isn't necessarily impossible, but it's not /directly/ possible, either. You'll sort of be relying on a bit of a side-effect of something else, and hoping that you can get a good match without catching too many unrelated posts in the process. Tho if it /does/ catch other posts you can potentially use score ordering or incremental scoring to rescue them. IOW, this will be advanced score usage that could be complicated to setup and not necessarily worth the hassle, but in theory it can be done... sort-of. Here's the deal. Proper threading uses the references header. This header contains a multi-generational list of "parent" post message-IDs. To score on threads or subthreads you score on the appropriate message-ID in the references header, and anything that matches will get the assigned score. The problem is that message-IDs (which are assigned to both email and news messages, news message format being almost entirely the same as email message format, with a few different news-specific headers and generally omitting a few mail specific headers, altho both news and mail headers can be present and normally won't conflict with each other) are designed to uniquely ID specific messages, so a match of an entire message-ID will match only the single (sub)thread in reply to that specific message. To match all threads originated by a specific author, you need to find something unique about that author's message-IDs that you can score on, that won't catch other author's message-IDs as well. To the extent that you can do so, you can filter threads replying to that person. To the extent that you cannot, that the fixed part of the target author's message-IDs also appear in the message-IDs of others, you score their messages also. As it happens, message-IDs are set either by the posting client, or by the server posted to, if the posting client didn't set one. There's no hard rules governing the algorithm used to get a globally unique ID that is extremely unlikely to apply to a different message (message-IDs are used to track messages, so if two different messages get the same ID, only the first one seen by a particular server or client will normally appear), only general rules on the characters it can contain and the general format, which is similar to an email address, userpart @ domainpart. (I deliberately spaced it out to avoid triggering gmane's email address obfuscation.) If the posting client doesn't include a message-ID, then the server will set one. Usually the domain side of these is the domain name of the news service provider the message was posted to, say @ giganews.com, or some such. Of course scoring on that will catch all users who post to that NSP, with clients that don't set the message-ID themselves. Clients that set the message-ID can use a similar pattern, pan uses the domain name of the email address you are posting with, for instance. The Agent (and freeagent) client at least used to use the agent domain name instead. Of course, in most cases either one of these will result in a domain name match that matches far more than one poster. So the domain name side of the message-ID can be useful in narrowing things down, but ordinarily won't be enough by itself to identify a single poster, so you'll need to match something from the user side of the message-ID as well. But the user-side of the message-ID tends to be almost entirely unstandardized, except of course there's some restriction in the characters that can be used, and the idea is to ultimately have something unique enough that no other message will have the same message-ID, despite a lot of other messages from the same poster and others normally having the same domain-side. So what you'll want to try to do is look at the message-ID of a post from the target author, and **TRY** to find a match that's as unique to his posts as possible, but still dependably identifies ALL his posts. If you're lucky, he uses a news server or client that nobody else posting to the group in question uses, and between limiting the score to that domain-name side of the message-ID, plus anything that's unique on the user side, and limiting that score to a specific group, it'll "just work". Tho of course there's always the possibility that a new poster will appear that matches as well, that you'll miss. But chances are pretty good you won't find a good enough match and that other posters will match that score as well. But if it's only a few other posters that get caught in the net, all hope is not yet lost. Pan uses two types of scoring, absolute scoring, where a matching rule sets that score and no further rules are processed, and incremental scoring, where the score is simply increased or decreased by the value in the score. Ignore is a score of -9999 or lower. Normally, setting an ignore sets an absolute score of -9999, but a post can also be ignored if no absolute scores apply but the total of all incremental scores ends up being -9999 or lower. So if the net cast by your would-be references-header message- id ignore is too wide and catching others as well, you have two possible methods to counteract that. If you want to use an absolute score ignore, then counteracting it is as simple as setting another absolute score that catches the "mistakes", that gets processed first (appears before the too wide score in the scorefile, which you can edit for order as necessary). The problem here is that the references header will contain message-ids from multiple generations of parent, and the ones that contain the target may well contain the false-positive IDs as well. So an absolute score isn't likely to do what you need, because trying to undo it for the false- positives will likely undo too much as well. Which leaves incremental scoring. The idea here would be to find a mix of scores such that in the end, all the matches for the target posts end up at -9999 or lower, while incrementals add just enough score back to the false-positives to rescue them from the ignore, bringing their score up to at least -9998, if not up further, to zero or positive. That's definitely an art unto itself; or as I said above, "advanced". Meanwhile, something that may help: In your example you specified threads INITIATED by U. As it happens, regular-expression matches have a way to specify BEGINS WITH and/or ENDS WITH. If you're only worried about matching threads where U is the original poster, the ^ character at the beginning of the regex can be used to specify "begins with". You can then use a wildcard that omits the ">" character used to terminate each message-ID, thus forcing the match to only apply to the first one. Something like this (spaces again inserted either side of the @) : References: ^[^>]* @ sample\.com> ^ means begins-with. The [] encloses a character-set, with ^ as the first character meaning "not". * means "any number of matches of the previous". So what that means is: References header, begins with, any-number-of-characters-not-including->, @ sample.com, >. Thus the first message-ID in the references header would have to have sample.com as the domain name portion. But something else to keep in mind as well: Some clients are broken and do not include a properly populated References header in replies. These clients will often attempt to thread by the contents of the subject header, instead. Obviously, no references header, no match on a references-header score. =:^( -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users