Re: [Pan-users] Scoring

Duncan Mon, 21 Oct 2013 04:14:40 -0700

Theodore M Rolle Jr posted on Sun, 20 Oct 2013 19:50:27 -0400 as
excerpted:


> Duncan,
> 
> Lacrocivious sent me here from the IRC channel, recommending you as one
> who "knows the rules."
> 
> I need to get a better understanding of the Score file rules.
> I know there's more efficient rules than Pan creates.
> 
> Ted

Definitely so.

Pan uses the same basic scorefile format and rules as slrn, altho some of 
slrn's advanced scorefile features such as includes aren't implemented.  
An exception is that pan's regex are case insensitive.

Unfortunately the slrn.org domain appears to have expired and is now 
"parked", so the scoring.txt file bookmark I've used for years is no 
longer valid.  Googling...

Found it at sourceforge! Updating bookmark! =:^)

http://slrn.sourceforge.net/docs/score.txt

The MS-based news client xnews has a similar but not identical scorefile 
format as well, the biggest difference there being that it uses regex for 
the group names too, instead of the wildcard format slrn uses for them.  
So the xnews scoring doc can be read as well for a different viewpoint, 
tho if you think it'll only mix you up, of course don't.

http://xnews.remarqs.net/scoring.txt

In terms of efficiency and neatness, there's several points to remember:

* Comments begin with a % character, and pan adds a *LOT* of comments.  
All those %BOS/%EOS lines, etc, are pan comments that at least I find to 
be more noise than signal, so I delete them.  The only pan comments I 
actually keep are the date lines for expiring scores, as I find it useful 
to know when I created an expiring score as well as when it expires.  All 
the others, gone.

I do have a few of my own comments as notes at the beginning of my 
scorefile (including links to the two scorefile docs as well as some very 
brief format notes), and of course use separator lines of %#### or %%%%%% 
where I find them useful, but I do **NOT** have more comment lines than 
actual operational lines, as pan tends to do on its own.


* If you look at pan's event log from when pan starts, you'll see where 
it reads the scorefile, and it'll tell you how many scores in how many 
sections you have, along with how many expired scores it loaded and then 
immediately expired (note that it doesn't remove expired scores from the 
scorefile, it simply removes them from memory after loading them and 
finding they're expired.  So obviously, weed out all the expired ones 
from time to time (tho I prefer to keep at least one expiring score 
around for example purposes, even if it's expired, so I never remove the 
last one).

* The newsgroup lines with their [] enclosures serve as section 
delimiters.  These are the sections pan refers to in its log, so if you 
can, generalize your newsgroup names and group as many scores under each 
newsgroup as possible, thus reducing the number of "sections" pan has to 
process and keep in memory.

* Similarly, the score keyword lines delimit actual tests/scores, so 
where possible, group as many conditions ORed together under the same 
score lines.


Using this strategy, the pan event log says I have only eight scoring 
rules in three sections, altho I guess I have several hundred individual 
conditions.


* Meanwhile, it's worth noting that as with slrn, pan should now be able 
to score on all headers (IIRC Heinrich added that feature a year or two 
ago, so it's not "new", but if you're running an older distro version or 
something, say 0.136 or earlier altho I'm not sure that's the specific 
one, it might not work for you), not just the ones available in the pan 
scoring GUI, altho the ones in the GUI tend to be the headers available 
in the overview file and thus are VASTLY more efficient to process.  Non-
overview headers cannot be scored until pan actually downloads the full 
headers, which usually means the full message, since that's when it 
actually has them available to score on.  While that's still better than 
nothing if it prevents you from having to process the message manually, 
the message must still be downloaded and that's obviously not as 
efficient as being able to score on the overview file headers, since the 
overview files is what pan actually gets when it "gets headers", so it 
can process those scores then, and avoid the actual download entirely for 
ignored scores and the like.

Of course with the automated actions feature, it's now possible to have 
pan mark-read or delete ignored (or negative-scored) messages, and being 
able to do so based on scores from headers in the overviews means you 
don't have to worry about seeing or downloading those messages at all.  
Again, scoring on other headers can still be useful if it means you don't 
have to look at it, but you don't avoid the download that way.

Of course the other side of that is that you can set (for example) 
watched messages to auto-cache or auto-download... again, *IF* you can 
score on headers found in the overview, thus before you've manually 
downloaded it anyway.

I'll post a followup with parts of my scorefile, so you can see and 
perhaps copy the format-note comments I have, as well as see the format 
organization I'm using.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users

Re: [Pan-users] Scoring

Reply via email to