Hello Duncan, your answer appears to be a short explaination of regex, but does not work as expected.
Am Montag, den 08.06.2015, 11:31 +0000 schrieb Duncan: > Heinz Mezera posted on Mon, 08 Jun 2015 09:16:22 +0200 as excerpted: > > > I'd like to select Headers in the Header-Pan with a regular expresssion > > in the Subject/Author field and need your help. Is this possible and how > > do I do it. > > > > I want to select all headers > > - starting with three alphabetic characters > > - followed by an underscore > > - two digits after the underscore > > - and any number of charcters afterwards. > > > > PAN Info: > > Pan 0.139 Sexual Chocolate (GIT bf56508 git://git.gnome.org/pan2; > > i686-pc-linux-gnu) > > ** Note that after changing the search expression, you may have to toggle > to something else (say subject), then back to regex, in ordered to get it > to "take". I noticed it would dynamically refilter part of the time, but > would appear to stall out and not update without the toggle, sometimes. > Given that hint, and the caveat that I tested the components separately > but not together, as I didn't have posts handy that matched that specific > pattern... > > One way to do it: > > ^[[:alpha:]]{3}_[[:digit:]]{2}.*$ No matter what expresiion I enter into the search field the headerlist will be totally empty. To make sure it's not an error in the regex I tried ^.*$ I think the above expresiion should show all headers, but the header list is totally empty. Enclosing the expression in single or double quotes or brackets has no effect. What am I doing wrong? > > ^ = zero-width match at the beginning/left > $ = same at the end/right > > Non-special characters match themselves. Letters, digits, _, etc, are > non-special. > > . matches exactly one occurrence of any character (and *, mentioned again > below, is any number including zero, so .* is a full wildcard, including > matching nothing). > > [] encloses a "character class". Such character classes can include > ranges of characters [a-z], individual lists [123], and/or category > classes (I seem to have forgotten the proper term ATM) like the above, > enclosed in further [:xxx:] marks, thus the nesting. > > So [[:alpha:][:digit:]] and [a-zA-Z0-9] would both match alphanumeric > characters in ASCII, tho pan's regex is case insensitive so both a-z and > A-Z wouldn't be needed for pan, only one or the other. You can also do > things like [[:digit:]abc._], to match digits, abc, and the individual > characters . and _. The significance of the [:xxx:] matches, however, is > that they work across character sets, so [:alpha:] matches letters that > would be skipped in character-sets where a-z doesn't include all letters > due to strange ordering or something. > > To match a - in a character-class, put it at the beginning so it can't > specify a range. The \ char is the escape char, both inside and outside > a character-class, so you can use \] to match a literal ] for instance, > and of course \\ to match a literal \. > > Additionally, you can specify a /negative/ character-class with ^ as the > first character (outside a character-class, it means match the beginning, > inside, as the first character of the class, it negates the class, inside > as anything other than the first char, it matches itself normally). So > [^abc] means any character /but/ abc. > > Significantly, character classes normally only match *ONE* character. To > match more than one you can repeat, [a-z][a-z] will match TWO letters, or > use frequency specifiers inside of {} as I did, above. {1,3} would be > one, two, or three matches, {1,} would be at least one match. > > In addition to the {}-delimited frequency range specifiers, there's: > > * = zero or more (*NOT* one or more, it doesn't have to be there!) > ? = zero or one (may or may not be there, but matches only once) > + = 1 or more > > Again in case it didn't sink in above, \ is the escape char, so to match > a literal *, you'd use \* > > () are the grouping characters, and | indicates alternatives (or). So > ((cat)|(horse)) will match "cat" or "horse" but will NOT match "cah", for > instance. Note that the alternatives do NOT need to be the same length, > and that the inside grouping help clarify the scope of the match but > aren't absolutely required, so (cat|horse) should have the same effect. > So there are two ways to match a "cat" that may or may not be there: > > (cat)? > (cat|) > > That's the basics. FWIW for non-pan usage, some regex uses make things > like {} special characters, so {3} is a frequency and \{3\} are the > literal characters, while others don't unless they're escaped, so {3} > would be the literal characters and the backslash-escaped version would > be frequency. And of course the shell has its own special chars and \ > escape char, so sometimes you need to play with the number of \\\ a bit > in ordered to get it to work like you want, but once you understand the > basics, even /just/ the basics, regex can really be quite powerful. > > Of course there's far FAR more. Just a couple quick examples. First, () > not only groups, but stores for later use. So if for instance you are > trying to match quotes but don't know if it's single-quotes or double- > quotes, you can use (['"]) for the first match (possibly as (['"])? or > ('|"|) if you don't know if it'll be quoted or not), and \1 or possibly > $1 to automatically match the same thing at the other end of the quote. > Second, there's what's called look-ahead and look-behind matching, which > can be positive or negative. So for instance if you want to match "pro" > but not "gopro", there's a way to say "look behind (to the left of) the > pro and don't match if the preceding letters are 'go'". I don't use them > enough to be sure of my memory, however, so generally have to look that > sort of advanced stuff up, if I need it. And for this advanced stuff, > you usually have to either lookup or test whether whatever you're trying > to work with actually supports it or not. I'm not sure whether pan does, > for instance, tho it wouldn't surprise me if it did. > > So back to the specific case in point: > > ^[[:alpha:]]{3}_[[:digit:]]{2}.*$ > > Given the above, we can parse that as: > > ^ Left anchor (begin the line with what follows): > > [[:alpha:]] one alphabet character > > {3} match the previous exactly three times > > _ (matches itself) > > [[:digit:]] one digit > > {2} match the previous exactly twice > > . any character > > * match the previous any number (including none) of times > > $ right anchor (end of line) > > > Of course the .*$ aren't actually needed, since without them the match is > simply left-anchored only, but I like the explicit "the rest of the line > doesn't matter for the match" that .*$ provides. And in non-pan usages > where you're matching to delete or replace the match, it COULD matter, as > failing to include the .*$ would leave any other junk on the line still > there, while including it would match and thus delete/replace the entire > line. > kr Heinz _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users