Heinz Mezera posted on Mon, 08 Jun 2015 09:16:22 +0200 as excerpted: > I'd like to select Headers in the Header-Pan with a regular expresssion > in the Subject/Author field and need your help. Is this possible and how > do I do it. > > I want to select all headers > - starting with three alphabetic characters > - followed by an underscore > - two digits after the underscore > - and any number of charcters afterwards. > > PAN Info: > Pan 0.139 Sexual Chocolate (GIT bf56508 git://git.gnome.org/pan2; > i686-pc-linux-gnu)
** Note that after changing the search expression, you may have to toggle to something else (say subject), then back to regex, in ordered to get it to "take". I noticed it would dynamically refilter part of the time, but would appear to stall out and not update without the toggle, sometimes. Given that hint, and the caveat that I tested the components separately but not together, as I didn't have posts handy that matched that specific pattern... One way to do it: ^[[:alpha:]]{3}_[[:digit:]]{2}.*$ ^ = zero-width match at the beginning/left $ = same at the end/right Non-special characters match themselves. Letters, digits, _, etc, are non-special. . matches exactly one occurrence of any character (and *, mentioned again below, is any number including zero, so .* is a full wildcard, including matching nothing). [] encloses a "character class". Such character classes can include ranges of characters [a-z], individual lists [123], and/or category classes (I seem to have forgotten the proper term ATM) like the above, enclosed in further [:xxx:] marks, thus the nesting. So [[:alpha:][:digit:]] and [a-zA-Z0-9] would both match alphanumeric characters in ASCII, tho pan's regex is case insensitive so both a-z and A-Z wouldn't be needed for pan, only one or the other. You can also do things like [[:digit:]abc._], to match digits, abc, and the individual characters . and _. The significance of the [:xxx:] matches, however, is that they work across character sets, so [:alpha:] matches letters that would be skipped in character-sets where a-z doesn't include all letters due to strange ordering or something. To match a - in a character-class, put it at the beginning so it can't specify a range. The \ char is the escape char, both inside and outside a character-class, so you can use \] to match a literal ] for instance, and of course \\ to match a literal \. Additionally, you can specify a /negative/ character-class with ^ as the first character (outside a character-class, it means match the beginning, inside, as the first character of the class, it negates the class, inside as anything other than the first char, it matches itself normally). So [^abc] means any character /but/ abc. Significantly, character classes normally only match *ONE* character. To match more than one you can repeat, [a-z][a-z] will match TWO letters, or use frequency specifiers inside of {} as I did, above. {1,3} would be one, two, or three matches, {1,} would be at least one match. In addition to the {}-delimited frequency range specifiers, there's: * = zero or more (*NOT* one or more, it doesn't have to be there!) ? = zero or one (may or may not be there, but matches only once) + = 1 or more Again in case it didn't sink in above, \ is the escape char, so to match a literal *, you'd use \* () are the grouping characters, and | indicates alternatives (or). So ((cat)|(horse)) will match "cat" or "horse" but will NOT match "cah", for instance. Note that the alternatives do NOT need to be the same length, and that the inside grouping help clarify the scope of the match but aren't absolutely required, so (cat|horse) should have the same effect. So there are two ways to match a "cat" that may or may not be there: (cat)? (cat|) That's the basics. FWIW for non-pan usage, some regex uses make things like {} special characters, so {3} is a frequency and \{3\} are the literal characters, while others don't unless they're escaped, so {3} would be the literal characters and the backslash-escaped version would be frequency. And of course the shell has its own special chars and \ escape char, so sometimes you need to play with the number of \\\ a bit in ordered to get it to work like you want, but once you understand the basics, even /just/ the basics, regex can really be quite powerful. Of course there's far FAR more. Just a couple quick examples. First, () not only groups, but stores for later use. So if for instance you are trying to match quotes but don't know if it's single-quotes or double- quotes, you can use (['"]) for the first match (possibly as (['"])? or ('|"|) if you don't know if it'll be quoted or not), and \1 or possibly $1 to automatically match the same thing at the other end of the quote. Second, there's what's called look-ahead and look-behind matching, which can be positive or negative. So for instance if you want to match "pro" but not "gopro", there's a way to say "look behind (to the left of) the pro and don't match if the preceding letters are 'go'". I don't use them enough to be sure of my memory, however, so generally have to look that sort of advanced stuff up, if I need it. And for this advanced stuff, you usually have to either lookup or test whether whatever you're trying to work with actually supports it or not. I'm not sure whether pan does, for instance, tho it wouldn't surprise me if it did. So back to the specific case in point: ^[[:alpha:]]{3}_[[:digit:]]{2}.*$ Given the above, we can parse that as: ^ Left anchor (begin the line with what follows): [[:alpha:]] one alphabet character {3} match the previous exactly three times _ (matches itself) [[:digit:]] one digit {2} match the previous exactly twice . any character * match the previous any number (including none) of times $ right anchor (end of line) Of course the .*$ aren't actually needed, since without them the match is simply left-anchored only, but I like the explicit "the rest of the line doesn't matter for the match" that .*$ provides. And in non-pan usages where you're matching to delete or replace the match, it COULD matter, as failing to include the .*$ would leave any other junk on the line still there, while including it would match and thus delete/replace the entire line. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users