Jim Henderson posted on Thu, 08 Jan 2015 19:02:20 +0000 as excerpted: >> I guess I should do a bit of experimentation of my own... but I'm lazy. >> Still, if I get the motivation... sometimes these things build in the >> background until I just decide to do it one day... > > I know the feeling. :)
OK, I have tomorrow off and decided it's time to experiment (even if it's passed 3 AM here ATM)... * Pan, at least git-pan (version in headers) *DEFINITELY* knows how to score on arbitrary headers -- it works here! =:^) * Of course after adding the new scoring rule, you must either tell pan to do a rescore manually, or reload the group (by switching to another and back), so pan knows the scorefile has changed, before it'll show the results of the new scoring rule on existing messages. * As expected, overview-only messages ("header-only" messages that don't have the full message in cache) do not get the arbitrary-header scores applied as those headers aren't downloaded yet. * Downloading the message *DOES* appear to apply the arbitrary-header scores automatically, provided of course that pan already knows about them (see the second point, above). So far, as predicted. HOWEVER, unhappily... * Applying arbitrary-header scores has a rather high per-processed- message cost, as pan loads every single cached message in the active group as it checks for that arbitrary-header-match. On a default 10 MiB cache that's not going to be too terrible, but on my unexpiring-archive multi-gig cache with messages going back to 2002 in some groups (including this list/group), it can take /minutes/ to load a group as it scans all those cached messages in ordered to score them! So, as I suggested, you'll want to ensure a cache size large enough to do contain all the messages you want to cache and arbitrary-header score, BUT, arbitrary-header scoring will quickly turn unworkable due to waiting if you're like me and have over a GiB of primarily text (so small) messages cached! * Additionally, there's one problem message in one of those groups that I reported as triggering a segfault. I have it saved to investigate further later so I've not deleted it, and normally simply don't click on it so it doesn't crash pan. However, as a result of pan scanning full messages when it has arbitrary-header scores loaded, with such a score active I can no longer enter that group, since pan will try to scan that message and promptly segfault! Similarly, I can't tell pan to get new headers (um... overviews!) for all groups (even if I'm in a different one), since apparently that triggers a scan of that file as well. I did test to see if I could get _overviews_ for individual groups and that works fine. I can also switch groups, which works fine (tho it takes "forever" as mentioned, especially for the more active or longer cached history groups) as long as I don't try to switch to /that/ group. As soon as I do something that'll trigger a rescore for that group, however, pan will crash due to scanning that known-bad message. OK, so I should really finish that investigation and delete that message, or at least move its cache-file elsewhere. That'd presumably let me access that group again. However, arbitrary-header-score-scanning is really too slow for the number of messages I have cached anyway, so I'll probably simply delete those test scores and forget about using arbitrary- header scoring here. But it /does/ work! Thus, I'd guess you either (a) failed to reload the scorefile (either by rescoring or by toggling to a different group and back), or (b) something's wrong with your scoring rule, or (c, for others) perhaps you're using an old version, which doesn't have the patch enabling that feature. But I can see from your headers that c shouldn't apply as you're on git as well (the same commit I was running until a couple days ago, when an update pulled in a couple l10n updates, no actual code), leaving a or b. The good news is that it does work, and that with your 10 MiB default cache size, while you might notice a bit of a slowdown switching groups, it shouldn't be the multiple minutes I'm getting here, with a gig plus cache and messages back to 2002 in some groups. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users