Tim Kynerd posted <[EMAIL PROTECTED]>, excerpted below, on Thu, 29 Sep 2005 11:37:06 +0200:
> On 29 Sep 2005, at 10:54, Brad Rogers wrote: > >> Hello All, >> >> What do I need to set "Group" to, when plonking an author, such >> that it >> matches all ng's? >> >> I keep seeing articles that I'd rather not. He doesn't change his >> e-mail address, and I set the filter to ignore the subject, but I >> obviously haven't fully understood the "Group" definition. >> >> Thanks in advance. >> > > I'm not 100% sure this works (I haven't checked it carefully), but I > set the Group condition to "is not bla.bla.bla". I can be fairly sure > there will never be such a group on Usenet, so this condition should > apply to all groups. There are several ways to set "apply to all groups". If you look at the score file itself, PAN converts them all to regular expressions (regexs) before writing them (the other options being plain English for some prewritten regex magic), so that would be the "native" way to set an expression. Using regular expressions, "." substitutes for any character (if you want /just/ a ".", escape it with a "\", thus "\.". "*" means "zero or more of the preceeding character". Thus, ".*" means "zero or more of any character". That pretty well covers all groups. Another way to do it is to observe that all real USENET groups (and probably almost all private newsgroups, perhaps with a very few exceptions) contain the real "." char as a separator, at least once. Thus, you can select "contains", and fill in a ".". Again, that should match all groups. As I noted, of course, PAN will internally convert that to a regex before writing it to the score file. "containing a dot" in regex is simply "\." (the dot must be escaped, as noted above). Note that there are no anchor characters (a "^" to the left indicates that the line begins with the sequence, a "$" to the right, if it doesn't follow an escaping \ of course, indicates that it ends with the sequence, these are called "anchors" because the anchor the sequence at the beginning or end (or both) of a line). Thus, the "." can be anywhere on the line, so it'll pickup any newsgroup name with a dot in it, which is basically all of them. (IDR if the RFC mandates a second level name, therefore at least one dot, or not, but if there are exceptions, they are few and far between.) Of course, the same "zero or more instances" of any old character can be used to mean the same thing -- any group. Thus, "c*" would be any group containing zero or more instances of the c char, therefore, any group. Any letter or number could be used, as those don't have special meaning (like the * or . or \ or most other punctuation does) if not escaped. BTW, because "\" is the escape char, "\\" escapes the special meaning of the second backslash, converting two into one. Thus, in regex, \\ converts into a single \. I mentioned the anchor chars. That of course presents its own way to mean "all groups". "^$" is the empty line, which of course means "no groups", so select "does not match regular expression" and fill in "^$", to "not match a blank group name", thus, matching all groups. =8^) How's that convert to a regular expression in its own right? That's a bit more complicated. Honestly, I had to try /that/ one to see, and I still don't quite understand the resulting notation. I'll have to look it up to see if I can find documentation for it. Completing this whirlwind intro to regex, the () chars group a sub-expression, as one might expect, the [] chars create an itemized character subclass, within a subclass, the - acts as a range character, and as the initial char of a subclass, the ^ means "not". The | means "or". Thus, [0-9] is one numeric character. [-0-9] means a dash or numeric character (the dash at the beginning can't indicate a range so it matches itself) [a-zA-Z0-9]+ means one or more (as opposed to the * meaning zero or more, + means one or more) alphanumeric chars. [^0-9] would be a char that's NOT numeric. (tom|jerry) means one /or/ the other of them. (tom(my)?|jerr[iy]) means tom, tommy, jerry, or jerri (? means zero or one occurances, it may or may not exist). Etc. Finally, note that as implied by the alphanumeric example, regexs are normally case sensitive, so those tom and jerry examples above would NOT match the capitalized names. [Jj][Ee][Rr][Rr][Yy] would match jERrY in any case. (There are additional allowances for specifying case insensitive matching, but that's in the flag section of the match, which is out of the scope of the discussion here.) IDR if PAN's matching is case sensitive or not, but unless a special exception has been made, it will be. Editing the score file directly should be /far/ easier now, with a bit of regex knowledge. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users