Jim Henderson posted on Thu, 08 Jan 2015 19:02:20 +0000 as excerpted: > On Thu, 08 Jan 2015 05:19:20 +0000, Duncan wrote: > >> Jim Henderson posted on Wed, 07 Jan 2015 00:34:20 +0000 as excerpted: >> >>>> Meanwhile, just to confirm, arbitrary header scoring did work, but >>>> only after downloading the messages and possibly manually triggering >>>> a rescore, >>>> correct? >>> >>> Hmmm, I didn't try a manual rescore, but the scoring that applied to a >>> post that should have been affected didn't show up when I went to look >>> at the rules applied. >> >> OK, so we do /not/ have confirmation that pan actually does arbitrary >> header scoring, but we /do/ have confirmation that /if/ it does, it >> doesn't do it automatically after the download, and requires a manual >> rescore. > > I'm not sure that that's an accurate summary of what my testing found - > I ended up not getting a score based on an arbitrary header. > > Checking the score on a message that I know matches my arbitrary scoring > rule, it doesn't show the score item I added. > > The lines I added to ~/News/Score were: > > %BOS %Score created by JSH > [*opensuse.org*] > Score:: =9999 > X-Forwarded-For: ^[address redacted]$ > %EOS > > Where [address redacted] is a valid IP address. I followed the format > used for the From: score that appears above it in the file.
With my own testing (as mentioned in a post yesterday) demonstrating that arbitrary-header scoring does work, and that pan appears to score on download without a manual rescore, provided it has already loaded that score, we're left with the following possibilities: Either: 1) Your regex somehow failed to match, OR 2) Pan hadn't yet reloaded the scorefile after you edited it, so it didn't know about your new score when it downloaded your test messages. OR 3) An absolute =nnnn (as opposed to additive nnnn, no =) score that happened to match that message, appeared before your test score in the scorefile. Because absolute scores are intended to be absolute, no further scoring is done after the first absolute match is found -- that first match is applied and that's it -- so unlike additive scores, absolute score order MATTERS. Here's what I did for my test. I used gmane as my test server and tested in gmane.* groups. Due to the way gmane works, messages thru gmane have a header that looks like this (obvious obfuscation applied to avoid gmane email munging): Approved: news at gmane dot org Posts on gmane also have a header like this (picking your post as an example): Archived-At: <http://permalink.gmane.org/gmane.comp.gnome.apps.pan.user/14813> Since these are unlikely to be in the overview (tho I didn't actually check) but are extremely common (pretty much every post) on gmane, I decided they'd make good arbitrary-header scoring test material. So: [*] Score:: 100 %testing arbheaders Approved: gmane\.org Score:: 200 %arbheaders test2 Archived-at: gmane\.org Now those are additive scores and went below my normal scoring, so if any absolute scores applied, pan would never get to these, but otherwise, assuming no further additive scores applied, basically all "current" gmane messages should get a score of 100+200=300. Some things to note altho they'll be review for those familiar with the scorefile format: % starting a line indicates a comment. All those %BOS/%EOS lines that pan adds are purely that, comments, and do nothing to change the actual scoring. Knowing that, for me those comments are mostly noise and I don't use 'em, tho I do have my own explanatory comments when necessary, and do tend to keep an originating date on any /expiring/ score, just so I know how long I intended it to run before expiring. Similarly, on a score line, a % after the score value indicates a comment and can be used to give the score a name, exactly as you see in my example. The [] starts a scoring section as well as indicating the newsgroups that section applies to. Newsgroups entries are * wildcard, not the regex that applies to the content of most headers. So the tested [*] says match on the following scores regardless of the group name, which was fine for my tests. If I had set the first one to =100 instead of a bare 100, it would have been an absolute score, and any match at that point would prevent pan from even getting to the next score with that post, since an absolute score is just that, absolute, and the first such matching score applies, period. (Of course this is one of the possibilities I list above for why your test didn't seem to work, that an earlier absolute score match prevented pan from ever reaching the test score.) For my testing purposes at least, I didn't need to match the entire header, just verify that it was there, and that it contained the gmane.org bit. Thus I didn't need the ^ and $ string beginning and ending anchors. And of course the \. forces the dot to be matched as a literal dot, not the "any character" that a dot metacharacter will normally match in regex. Now, after adding that to my scorefile and saving, I had to tell pan about the scorefile changes. So I selected a message and hit Articles, Edit Article's Watch/Ignore Score. In the resulting dialog box I simply hit Close and Rescore, to get pan to reload the changed scorefile. That did it. Most (cached) posts on gmane now appeared with a 300 score. Again, a few posts did not, because they matched some previous absolute score and thus were assigned that score and never reached my test scoring. Switching groups with that setup was when I really noticed the slowdown of those arbitrary-header scores, because now pan had to go thru all cached messages on the new group, checking each one to see if the arbitrary-header scores matched and scoring as appropriate. Then I tried downloading new "headers" (really overviews) in subscribed groups, to check scoring on new messages. Which is when pan crashed, since (as I explained in yesterday's reply) that meant pan scanned a known-bad message in another group, that is known to crash pan. After restarting pan and figuring out what happened (verifying the crash on getting new headers in subscribed groups another time or two in the process), I tried getting new headers in /selected/ groups, without the problem group selected. That worked without crashing! And as expected, the new scores didn't apply to the just fetched "headers" (overviews), because the overviews didn't contain the headers I was trying to score on. But as soon as I downloaded the actual messages, the news scores applied as the content was actually there to match against, now. =:^) But again, while I didn't see any in my short test (I couldn't get headers in subscribed groups or in the single affected group, without crashing, remember, and I didn't like the scanning delay when I switched groups either, so I had no interest in prolonging the test), had any of the new messages matched an absolute score reached before my test scores, of course the test scores wouldn't have applied here either. Sooo... What I'd suggest you try next is a more general match, as I did. If you use gmane you can duplicate my test scores and verify that they're working for you too, before proceeding. Once you get something general obviously applying, then home in on your objective. First try a score like this: [*] Score:: =500 % test x-forwarded-for X-Forwarded-For: .* That's absolute 500, to hopefully distinguish it from all the absolute 9999/watch scores, assuming you have score-colors set appropriately, and the score column set to display. And it should match ANY post in ANY group, that has ANY x-forwarded-for header set, no matter the content. Once that is verified to work as expected, narrow it down one factor at a time: [*opensuse.org*] ... First the newsgroup, matching any group name containing opensuse.org. ... X-Forwarded-For: somedomain\.net ... Then try a simple general domain match. That might be narrow enough right there, without a full string match. If not, continue to narrow it down, until you get a positive match without too many false-positives. Of course somewhere in there you can set your desired score, as well. But don't forget, with absolute scores involved, order matters! So order accordingly. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users