David Chmelik posted on Sun, 20 Mar 2022 02:29:35 -0000 (UTC) as excerpted:
> Usenet may again be a little better than it was in mid-to-late 1990s in > terms of spam--some newsgroups have no spam--but unfortunately once > again others still are almost all spam. > > So, I have a large killfile again and will be plonking more advertisers/ > pr0n & drug & weapons dealers, trolls, proselytizer religious fanatics, > etc.. > > However I noticed what often happens is I get a large number updates. I > go to those groups then see sometimes all posts are by plonked posters > with spam subject lines... just for a split-second, then disappear. > > Since I'm now subscribed to over 1000 newsgroups (if you add Usenet and > Gmane) seeing all those false updates wastes considerable time. > > Shouldn't the logic be fixed to omit those before they ever even get > counted as updates, so you don't waste a lot of time still seeing dozens > spam updates? There's several factors to consider here, some of which are inherent in the news protocol and thus not something pan can do anything about. First of all, there's a quick and very bandwidth efficient counts update mode (which I'm not actually sure pan uses at all) whereby group message counts can be updated quickly, with little bandwidth usage and very little additional information (no headers, etc). This simply asks the server for the first and last message sequence numbers it currently has in whatever group(s) and compares them to the message sequence numbers the client already knows about, so it can update the count of unread messages accordingly. However, the result is always the *maximum* number of potential messages available, not necessarily the number *actually* available. In particular, some servers assign message numbers before they do their filtering if any, and some messages may simply be gone from the server due to server-policy-specific spam filtering, copywrite or COPA takedown orders, message cancels, no-carry policies like binaries posted to anything out of of the alt.binaries.* hierarchy (which can affect binaries groups too if the post was cross-posted to non-binaries groups), etc. These will appear in the initial counts but not actually be available. Second, there's overviews mode, aka downloading "headers". But, this does *NOT* download true headers. Rather, it downloads an abridged version containing only the most common headers typically used for display of the message list. This typically includes From, Subject, Size/Lines, References (necessary for threading), and Message-IDs, and server admins can configure it to include others if they wish, but it does *NOT* normally include less common/useful headers such as organization, custom headers, etc. This affects scoring/watching/killfiling in that headers available in the overview can be scored against with just the information in the overview, that is, without downloading the actual message, while those not in the overview require actually downloading the message to apply that bit of the score. Of course it's far better to be able to score without downloading, thereby making it possible for killfiles to avoid downloading the message entirely, but for nym-switching posters in particular that's not always possible, yet there's often still something scoreable in the full headers (or body content) and being able to auto-ignore those posts even if they have to be downloaded to do it can still be quite useful. So depending on what headers exactly you're scoring on, or even depending on how the server does its numbering and filtering, you may see quite a number of messages that pan can't preemptively do anything about, until it gets more information, either downloading "headers" (actually overviews), or for headers not in the overview, even downloading the entire message. Meanwhile, particularly if your scorefile is large and not efficiently structured, processing it will take some time too. Here's a short example from my (very dated now because as I've posted before, I've not been active in the binaries for years, could actually be over a decade now) pr0n scorefile: [alt.*] Score:: =-9999 %Alt kill From: Seeking teens From: teens seeker From: sex coed From: NudeGirls Subject: R/-\\PE Subject: R/-\|PE That's going to be **FAR** more efficient than individual score entries for each of those. And note that they're headers that should be in the overview as well. If your scorefile looks more like it's going to if you've only added entries from the pan GUI and never text-edited them into something more efficient like the above, and if you're doing over 1000 groups as you mentioned, you could *easily* have tens of thousands of individual single- entry scores that can be combined into a rather more efficient say 100-200 compound-entries like the above. I've never let mine get overgrown and really haven't done anything lately with it at all, so I can't do any before/after comparisons, but I'm guessing it could make the difference between seeing some of the killfiled posts momentarily while pan processes the inefficient mess, and having them all processed before it displays anything (especially on a fast machine with plenty of RAM and NVDIMM storage, something my now decade-old machine is lacking, tho I did do the SSD upgrade from spun-glass on the SATA3s). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users