Jim Henderson posted on Mon, 05 Jan 2015 17:55:41 +0000 as excerpted: > What would it take to be able to score articles based on an arbitrary > header? > > Say, for example, I get an X-Forwarded-For: header - if I wanted to do > some simple matching (say even substring-based matching), is there > currently a mechanism that would let me do this (say, manually editing > the score file), or would it require changes to underlying code in Pan?
With three caveats, AFAIK scoring by arbitrary header should "just work" in current pan. Caveats: 1) I've not needed to actually try this personally, and I'm too lazy ATM to do the list (or git log) search to verify, but I'm almost certain that Heinrich said it works now. If it seems to fail when you try it, perhaps I can try to dig up the message, but I expect it /does/ work tho I can't personally vouch for that as I'm not personally using the feature. 2 (the big one)) While arbitrary header scoring should work, due to the nature of NNTP it's not as efficient, and will require downloading (at least part of) the message before pan can apply that score. You can't score it after only downloading "headers", as you can with pan's normal GUI scoring options. Here's the deal. What pan calls "headers" is actually "overviews". If you go back in list history you'll see I used to make a big deal about this, and for some time insisted on calling it the "overview pane" rather than the "header pane", because that's what it shows, overviews, *NOT* all, or even generally /most/, headers. Overviews, in NNTP, consist of a strictly selected subset of message headers and other message metadata -- generally that generically found most useful before full message download. From RFC 3977 section 8.3.2, the first eight fields of an overview MUST be, in order: "0" or article number (see below) Subject header content >From header content Date header content Message-ID header content References header content :bytes metadata item :lines metadata item A news admin MAY configure additional headers or metadata[2] for overviews, and the xref and distribution (if present) headers are commonly included. Anything else is entirely optional and left to provider/admin policy. Now here's the kicker. NNTP provides the overview command to fetch this information for individual messages or for a range of articles based on article number, and it's precisely this OVERVIEW information that most news clients, including pan, display as the article list, BEFORE THE ARTICLES THEMSELVES ARE DOWNLOADED (at least to local cache). As a consequence, even tho pan should score on the contents of arbitrary headers just fine, IF THE HEADER ISN'T IN THE OVERVIEW, PAN CAN'T SCORE ON IT UNTIL THE ARTICLE IS DOWNLOADED. Which /does/ cripple scoring on non-overview headers to a significant extent, but there's nothing to be done about it. And as long as it works, even crippled, if for instance you're ignore- scoring based on a non-overview header, even if you must download the full message to do so, that does still automate the ignore, so you don't have to /manually/ see and deal with these messages you presumably found offensive enough to want to ignore, and while that's not as good as being able to avoid downloading them at all, at least you don't have to see and deal with it manually, which is still CONSIDERABLY better than NOTHING! =:^) What I do NOT know, because as I said I've not actually tried it here, is if pan will automatically rescore when it downloads the message and can do so, or if you'll need to manually trigger a rescore. If you have to manually trigger the rescore, there's another implication. You'll presumably need to do what I normally do for binaries anyway, download a slew of them to cache for later processing, then come back when they're all in local cache and go thru them again, in this case, triggering the rescore presumably as first order of business when you come back to process the already locally cached messages. Of course at least for binaries that means configuring your cache size considerably larger than pan's default 10 MiB. Of course, if pan already does a second scoring pass after download to cache, or better yet, after download of just the headers so it can cancel big binary downloads before they're finished, then you shouldn't have to change the cache size. But I don't know if it handles that automatically or not. So if you test this, please post your results. =:^) 3) Yes, you must edit the scorefile manually to score on arbitrary headers. This is for two reasons. First, obviously that's an infinite list of possible headers, which doesn't fit well with pan's scoring GUI. Of course the GUI could include the ability to specify your own header, but that's where the second reason comes in. Pan's GUI, particularly when Charles was primary dev, was kept simple and ideally intuitive, and explaining the technical implications and limitations of non-overview header scoring is ANYTHING but simple. Thus the obvious solution, make it possible, but only by editing the scorefile directly. Those technical enough to be willing to do that should be technical enough to appreciate the implications of non-overview scoring, and motivated/desperate enough to still appreciate the more limited benefits it offers. Since you mentioned that as a possibility, presumably you're already familiar with the scorefile format. Just in case you aren't, or in case you need a refresher and don't have the link handy, here's the boilerplate: http://slrn.sourceforge.net/docs/score.txt That of course is the slrn scorefile doc. Pan uses the same basic format, but is case-insensitive by default, and doesn't handle some of the more advanced features (like external file-includes and nested conditionals). Also, last I was aware, pan had a bug and ORed all scoring conditions, the documented double-colon behavior, even where it was single-colon and thus by the documentation the conditions should be ANDed. See my past posts on the scorefile format for further details... So basically, to score on an arbitrary header, you'd create the score as normal, but use the desired header instead of subject/from/etc. Good luck, hope the limitations don't ruin it for you, and hope you can confirm it working for us! =:^) --- [1] References header and threading: Pan uses it too. Some clients (MSOE among them) thread by subject as well and if the subject changes, consider it a new thread, but that's not generally considered valid, and leads to confusion when people hit reply (and thus have a references header in their message) and think by changing the subject they're starting a new thread. The /valid/ way to start a new thread is with a new message, *NOT* a reply to an old one. [2] Headers and metadata: The distinction is this: Headers are always literal content within the message. Metadata is always calculated. Thus, for example, the :bytes metadata and Bytes: header are two entirely different things and may well have different content. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users